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ABSTRACT 

This p*P tr advances an approach for state estimation and identification of spatially 
distributed parameters embedded in static distributed (elliptic) system models. 

The method of maximum likelihood is used to find parameter values that maximize 
a likelihood functional for the system model, or equivalently, that minimize the 
negative logarithm of this functional. To find the minimum, a Newton- Raphson search 
is conducted that from an initial estimate generates a convergent sequence of 
parameter estimates. Central to the numerical search are a gradient functional and a 
Hessian operator, which are respectively the first and second function-space 
derivatives of the negative-log likelihood functional with respect to the parameter 
distributions being identified. For simplicity, a Gauss-Markov approach is used to 
approximate the Hessian in terms of products of first derivatives. The gradient and 
approximate Hessian are computed by first arranging the negative -log likelihood 
functional into a form based on the square-root factorization of the predicted 
covariance of the measurement process. The resulting data-processing approach, 
referred to here by the new term of predicted-data- covariance square -root filtering. 
makes the gradient and approximate Hessian calculations very simple. Since the 
parameter estimates are only approximations to the actual parameter values, there is a 
parameter estimation error inherent in the estimation process. Cramer- Rao bounds are 
obtained for the covariance of the estimation error in terms of the information 
operator associated with the likelihood functional. These error covariance bounds are 
then used to outline methods for optimal input design. 

A closely related set of state estimates is also produced by the maximum likelihood 
method: smoothed estimates that are optimal in a conditional mean sense and filtered 
estimates that emerge from the predicted-data- covariance square-root filter. The 
terms "smoothed" and "filtered" are used because the formulas which generate these 
estimates, when expressed in operator notation, are symbolically very similar to those 
used in filtering and smoothing for linear dynamical systems. A key similarity is the 
presence of a predictor- corrector structure containing estimator gains that, as in a 
Kalman filter, can be expressed in terms of the state estimation error covariances. In 
addition, a residual process can be defined, in the usual way, as the difference between 
the actual data and the predicted data obtained from the filtered state estimate. The 
residuals have properties nearly identical to those of an innovations process: the 
residuals are white with a unit covariance; and the residuals and measurements can be 
obtained from each other by means of reciprocal linear transformations. Because these 
transformations are not Volterra (causal), the residuals are not a bona fide innovations 
process. Even though they are not a true innovations process, the residuals are very 
useful, because they lead to state and parameter estimation schemes for elliptic 
systems that retain conceptually the simplicity of those obtained by the innovations 
approach to filtering, smoothing and identification for linear dynamical systems. 
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1 . INTRODUCTION AND SUMMARY 


The elliptic models considered in this paper can be cast as 

A(0)u(0) = B(0)o + C(0)f, (1.1) 

y * H(0)n(0) + n, (1.2) 

where A is a formally self-adjoint elliptic differential operator defined over the spatial 
domain Q; B and C are appropriately dimensioned operators that model the influence of 
the process error u and the input f on the state U{ H is an operator that characterises 
the state-to-observations map; u and n are white-noise model errors forming the model 
error vector c-[o,n]; and f is a deterministic input. Examples of the application of such 
models to the problem of static shape determination of large space structures are 
contained in Ref. [1]. 

The central aim here is to develop a maximum-likelihood approach to the 
estimation of the parameters 0 (these parameters could in general be spatially 
distributed) by using the data y and the system model itself. It is assumed that the true 
value 6 q of the parameter 0 is a deterministic but poorly known quantity. The input f 

can be selected to optimize the data generated for estimation. A related but somewhat 
secondary aim is to develop a methodology for computation of the corresponding state 
estimates. 

A Formula for the Nexative-Loa Likelihood Ratio 

It will be shown in Sec. S that the negative-log likelihood functional is specified by 

J(6;y) * V. Tr log [I+R(0)J + */. [y-m(0)]*lUR(0)l _1 [y-m(0)] - V. y*y, (1.3) 

where 

m(0) * H(0)<t>(0)C(0)f and R(0) - H(0)d>(0)B(0)B*(0)<t>*(0)H*(0). 

The integral operator <W0) is related to A(0) by A(0)dK0)»I, with I the identity. The 
symbol <b* denotes the integral operator adjoint to <t> so that $*<0)A*<0)=I. It will also 
be shown in Sec. 3 that m(0) and R(0) are respectively the "suspected" mean and 
covariance of the data y, under the assumption that the model error vector c«(<o,n] is a 
spatially distributed white-noise process [1] with a covariance operator B(ee*)«I equal 
to the identity. To simplify Bq. (1.3). the following notation has been used: 

y*y. <y,y> # and [y-m(0)]*|I+R(0)J~ 1 [y-m(0)J - <y-m(0)], [UR(0)r 1 Iy-m(0)]> > , 

where <*. •> indicates an inner product in the function space to which the data belongs. 
Predicted- Data- Covariance Square- Root Form of the Likelihood Ratio 

A numb er of alternative formula* for the negative-log likelihood functional are 
developed in Sec. 4. To solve the above minimization problem, the most convenient 
formula is: 
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J(6sy) * Tr log[UE(6)] 4 V* z*(0)z(9) - z*(0)y, 


(1.4) 


where 


z(0) = L(0)y + lI-L(0)]m(0), (1.5) 

L(0) = I -[I+R(0)f V \ K(0) - [I+R(0)] V, -I. (1.6) 


Equation (1.5) can be viewed as specifying a filter, characterized by the operator L(0), 
that processes the data y and the suspected mean m(0) to provide a "filtered" state 
estimate z(0). This filter L(0) will hereafter be referred to as the predicted 
data -covariance square-root filter because the key calculation required to specify L(9), 
as in (1.6), is the evaluation of the square-root of the predlcted-data-covariance 
operator [UR(0)]. The equivalence between (1.3) and (1.4) can be established by 
substitution of (1.5) and (1.6) into (1.4). 

Note for later reference that the definitions in (1.6) imply that K(9) and L(0) are 
related by 

[I+K(0)) _l = I-L(0). (1.7) 

Furthermore, (1.7) implies that K(0) * L(0)+K(0)L(0) * L(0)+L(0)K(0). 

Gradient of the Likelihood Functional 


The gradient functional dj/d9, to be defined more completely in Secs. 5 and 6, is 
specified by 

3J/30 = Tr|OL/30)(I+K)] 4 (z-y)*«z/a0), (1.8) 


where 


3z/39 = (3L/36)y 4 (I-L)(3m/30), 


(1.9) 


with y = y-m, and 3L/30, dm/30 being the function-space Frechet derivatives of L and 
m. These equations can be obtained from (1.4) by function- space differentiation with 
respect to 9. 


The gradient functional 3J(0;y)/30 in (1.8) is the Frechet derivative [2) of the functional 
J with respect to the parameter 0. The derivative is a linear transformation (assumed 
to be bounded) that maps an admissible parameter perturbation 60 into the 
corresponding perturbation 6J(0,60;y) of the likelihood functional by means of the 
equation 6J(0,60;y) * (3j(0sy)/30] 60. Detailed computation of the function- space 
derivatives above is conducted in Sec. 6 using a perturbation analysis of the 
eigensystem of the covariance operator R ■ obtained in Sec. 5. 

Note that in Sec. 7 it will be established that 


Eiaj^yi/aoi ^ 

0*0 O * 0, 


( 1 . 10 ) 
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so that the expected value of the gradient vanishes at the optimal parameter value 9 0 . 
Hessian of the Likelihood Functional 


Similarly* differentiation of (1.8) leads to 

a’j/ae’ » Tr io a L/d9 a ) (i+K) ♦ OL/ae) OK/ae) i + (z-v)*o s z/39 a ) + 


(3z/39)*(dz/ 39), (1.11) 

and to its expected value at 9*9 of M(9 ) » B[3 a J/39 a ]U « , i.e., 

o o 


B(a a i/89 a j ie,e * Tr lOL/ae) a+ k) «l*/ 39)] ♦. B[Oa/ae)*o*/a9)]. d . 12 ) 


Furthermore* substitution of (1.9) in the last term of (1.12) leads to 

B[a a J/39 a J | e , e * 2Tr |OL/a0) a+R) OL*/39>] + [d - L) (3m/30)J* 

0 

[d - L) (dm/39)]. (1. IS) 

Note that the expected value of the Hessian operator d a J/86 a evaluated at 9*9. is a 

c 

sum of two terms each of which is positive definite. Consequently* in a probabilistic 

sense made precise by (1.13)* the likelihood functional is strictly convex in the vicinity 

of the optimal value 9*9 . Note that by definition M(9 ) in (1.13) is also the 

o o 

information operator associated with the likelihood functional. 

Newton- Ranhson Search for t >»« Optima l Parameter Estimates 

Since the problem of minimization of J(9$y) in (1.4) has no closed-form solution, it is 
necessary to consider iterative numerical search techniques for optimization. The 
following constitutes a function-space Newton- Raphson iteration: 

e n+1 *9 n -M‘ l g . <1.14) 

n n 

where g Q = 3J(9°5y)/39 is the gradient functional (1.8) evaluated at 9*9 n j and where 

Mq - Tr K3L/39) (I+R> OL*/39)J 4 (3z*/39) ( 3 z/ 39 )|q .0 (1.15) 

Q 

is an approximation to the Hessian operator 3 a J/39 a in (1.11). This approximation is 
obtained from (1.12) by replacing the second term B[(dz*/39)(dz/d9)) with the actual 
value ((3z*/39X3z/39)J obtained in a single realization. Under certain conditions, to be 
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examined in more detail in future work, the sequence 6 n specified by (1.14) converges 
to a local minimum of J(6;y), if the initial estimate used to start the search is 
sufficiently close to the optimal value. 


Cramer- Rao Bounds and Optimal Input Design 


The above numerical search results in an estimate 0 of the actual parameter value 0 Q . 


In Sec. 7, a C-R bound for the covariance B(6 6 *) of the estimation error 6 

P P P 


obtained from the inequality. 


0-6 is 
0 


E(0 0 *) 5 M - 1 (6 ), (1.16) 

P P o 

where the information opera or M(0 o ) is specified in (1.13). The related mean-square 
estimation error is bounded by £(0^*0^) 2 TrlM - *(0 Q )]. 

The information operator M(0 q ) can also be used to specify criteria for optimal input 

design. Perhaps the simplest optimal selection method to implement is that which 

seeks to maximize Tr [M(0 )] with respect to f, subject to the constraint that f satisfy 

o 

the normalization condition of f*f=l. This method results in an optimal f which is the 
eigenvector corresponding to the largest eigenvalue of a positive-definite matrix 
described in detail in Sec. 7. Other criteria for optimal selection based on calculation 

of M -1 (6^) may be more difficult to implement but usually lead to superior 

performance. 

The Corresponding State Estimates 

Related to the parameter estimation approach are the following two distinct state 

estimates (denoted by u and z ): 

o o 

u o * E(u/y) * 4>Cf + G(y-H«PCf), z q » <PCf 4 g(y-H<bCD, (1.17) 

where G and g are Kalman-like gains (see Sec. 8) specified by 


G = £ sin a a^Xj,<pj.* F g * £ (1-cosa^) Xj^*. (1.18) 

In these equations, are the eigenvectors of the operator R - so that 

R4>k* X a <|> k , with \ a being the related eigenvalue*. Also, and Xj, are defined by 
tano^ * \.j, and Xj, = \~ a <bBB*<P*H*^. 

The state estimate u Q » E(u/y) is defined as the conditional expectation of the state 
given the data y. Since u q is an optimal estimate of u based on the entire data set (as 
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opposed to s subset), u Q can be viewed as a best smoothed estimate. The other 

estimate, z o in (1.17), will be referred to as • filtered state estimate. Ths filtered 

estimate has no known probabilistic interpretation similar to u q - B(u/y) above. 

However, in spite of the apparent lack of probabilistic meaning, estimate is useful 
in simplifying the gradient and Hessian calculations in (1.8) and (1.11). It will be shown 
in Sec. 8 that z q in (1.17) and z, the estimate emerging from 

the predicted-data- covariance square-root filter, are related by z « Hz . Hence, z is 

o o 

a bona fide estimate of the entire state, whereas z - Hz is a partial estimate defined 

o 

only at the observation locations. 

Kalman-like Gains and Error Covariances 

The gains G and g in (1.17) can alternatively be specified in terms of the 

covariance of the state estimation error inherent in' u and z , i.e., 

o o 

G * PH*, g » pH*, (1.19) 

where 

P * E((u-u o ) (u-u o )*], p p* » E[(u-z o ) (u-z q )*]. (1.20) 

The corresponding mean-square state estimation error is 

E[(u-u o )*(u-u o )J - Tr [f], B[(u-z o )*(u-z o )] - Tr [p+p*J. (1.21) 

Furthermore, f and p are related by 

P = p + P - pH*Hp. (1.22) 

Since the term pH*Hp is non-negative, the mean-square estimation error associated 

with the smoothed estimate u is never larger than that of the filtered estimate z . 

o o 

Filtering end Stwnnthiwg 

While u and z have been defined some shat independently in (1.17), they are 
o o 

related by: 

u » 2 4 pH*e, (1.23) 

o o 

where 

e « y - Hz q - « - HpH*)y - (I - L) y (1.24) 

is the residual process defined as the difference between the data y and the 
observed-state estimate Hz Q . The symbol y in (1.24) denotes the mean-centered data 
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process y * y - H<*>Cf. It will be shown in Sec. 8 that (1.22) and (1.23) constitute a 
generalization to elliptic systems of the forward/backward sweep method for solution 
of smoothing problems in linear dynamical systems. 

The Residuals as a Pseudo- fawwtin ng Process 

The residuals in Bq. (1.24) have two properties that are similar (but not identical) 
to those of an innovations process: 

B(ee*) = I, (1.25) 

e = (I - L) y, y * (I ♦ K)e. (1.26) 

Eq. (1.25) reflects whit ness of the residuals. Bq. (1.26) states that the residual and 
mean-centered data processes e and y can be obtained from each other by means of 

reciprocal transformations, i.e., (I + E)~ x * (I - L) as in (1.7). Whiteness of the 
innovations and reciprocal relationships between innovations and measurements are the 
two central features of the innovations approach to least-squares estimation for linear 
dynamical systems. Bqs. (1.25) and (1.26) are similar to these conditions. However, 
there is a key difference: the transformations (I + K) and (I - L) in (1.26) are Fredholm 
operators whose domain is the entire measurement space. This is in contrast to the 
Volterra (causal) operators in the innovations approach for linear dynamical systems. 
The notion of causality is not even used in this paper, although such a notion can be 
defined for certain classes of elliptic systems [1]. Because of this difference the 
residual process is not a bona fide innovations process. However, the residual process is 
stili useful in obtaining the relatively simple formulas in (1.8) - (1.26) for filtering, 
smoothing and identification. 

Paper Outline 

This section has at a summary level addressed many of the fundamental issues involved 
in the maximum likek_.>od approach to estimation. The subsequent sections of the 
paper contain a more complete description of the above results. 

Section 2. Development of the mathematical framework -- including integral 
operator models, a priori covariance analysis with white-noise model errors, Predholm 
resolvents, and eigenfunction expansions — required to arrive at formula (1.3) for the 
likelihood functional and to evaluate the corresponding function-space gradient in (1.8) 
and the approximate Hessian in (1.15). 

Section 3 . Derivation of the negative-log likelihood functional in (1.3). This 
functional is the negative logarithm of the likelihood ratio, associated with the 
detection of a Gaussian signal in additive Gaussian noise, traditionally encountered in 
the theory for communication and signal detection. 

Section 4. Development of alternative formulas for the likelihood ratio, some of 
which are more convenient to use than (1.3) in implementing the numerical search for 
optimization — in particular, development of the predicted-data-covariance 
square- root filter form (1.4) upon which the Newton- Raphscn search is based. 
Additional forms of the likelihood ratio which are of interest in their own right 
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(although not subsequently used in the paper) are: a smoothing form expressed in terms 
of the best mean-square state estimate; an eigenaystem expansion form based on the 
eigenvalues and eigenvectors of the operator in (1.3); a trigonometric 

operator form with which most of the manipulations involved in the may; mum likelihood 
approach can be visualized using their similarities to simple trigonometric formulas fo* 
scalars. 

Section 5 . Development of a first-order perturbation analysis to evaluate the 
infinitesimal changes in the eigensystem of the operator R«H$BB*<h*H* in (1.3) due to 
similarly small changes 66 in the parameter distributions being identified. This is the 
central calculation required to compute the function-space gradients 3J/80, 3z/30, 
3L/39 and 3m/30 in (1.8) and (1.9). 

Section 6 . Calculation of the gradient functional and approximate Hessian of the 
likelihood functional based on the perturbation analysis of Sec. 5. These are the two 
calculations which are central to implementation' of the Newton- Kaphson search and 
which have been used as a basis for computer programs to Implement the maximum 
likelihood approach. 

Section 7 . Parameter estimation error covariance analysis and Cramer- Rao bounds 
based on explicit formulas for the Hessian (information) operator in (1.13). Outline of 
an optimal input design approach based on using the Cramer- Rao bound as an optimality 
criterion. 

Section 8 . Analysis of the filtered and smoothed state estimates embedded in the 
parameter estimation approach. Analysis of the predicted~dat« -covariance square-root 
filter resulting in Kalman-like formula? for the filter gain, evaluation of the state 
estimation error covariance, and relationships between filtered rod smoothed estimates. 

Section 9 . Summary and explanation of calculations required for implementation of 
the numerical search for the optimal estimates. 

Section 10 . Conclusions and directions for future work in the areas of development of 
as; iptotic properties of the estimates and of optimal input design. 

2. PRELIMINARIES: Notation, Integral Operator Model, Covariance Analysis, 
Predholm Resolvents, and Bigenf unction Expansions 

The aim of this section is to develop a set cf miscellaneous results that will be useful in 
subsequent sections in conducting detailed derivation cf: the negative-log likelihood 
functional in (1.3) to be minimized, the corresponding function-space gradient in (1.8), 
and the approximate Hessian operator in (1.15). The main results of the section can be 
summarized as follows: 

• conversion of the partial differential operator model in (1.1) to an 
equivalent integral operator formulation. This integral operator 
formulation is introduced because it simplifies the statement and solution 
of the estimation problems in (1.1) - (1.3). 
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evaluation of the observed state covariance operator in 

(1.3), under the assumption that c = [o. ,n] is a spatially distributed 
white-no se process with a unit covariance operator. Related to evaluation 
of this covariance operator R is the similar evaluati^ of the suspected 
mean m = ri<t>Cf in (1.3). 

evaluation of the dual observed-state covariance operator Q=3*<b*H*H<&B 
- which can be viewed as xhe covariance of the output of a system model 
dual to (1.1), unde the assumption that this dual system is driven by a 
white-noise process. 


• definition of two sets of eigenvectors and of R and Q above, with X* 

being a set of common eigenvalues, xhese two sets of vectors can be used 
to expand functions in the input space and the output space Hg. 

* 

• definition of u vectors x^_ and p^ in the state space and its dual IT ? , 

related to and <|>j, above by ^ * X” 1 d>B^ and Pj, « Xj, l $*H*0j,. These 

two sets of vectors and satisfy a boundary-value problem similar to 

those traditionally encountered as necessary and sufficient conditions for 
optimality in quad atic optimal control rnd estimation problems subject to 
linear constrain 

• analysis cf the basic relationship between R and Q above and their 

corresponding Fredholm resolvents P and S defined as P - I-(UR) 1 and 

S - I -d+Q) -1 . Exp if sion of the operators R, Q, P and S in terrrs of the 
eigenfunctions <t>^ and ^ defined above. 

• development of trigonometric operator forms for R and ?. These 
trigonometric forms allow development of ir* ’.resting trigonometric 
alternatives to (1.3) in evaluating the likelihood lu .ticnal. 

While the section concentrates on the development of a mathematical framework to be 
used in subsequent sections, many of the above results (such as tht trigonometric 
operator formulas for the covariance operators) are of interest in their own right, 
somewhat independently of their subsequent application. 

Hilbert Space Notation 

There are three Hilbert spaces of primary interest: the input space H^ to which 
the process error a and the deterministic input f belong; the state space containing 
the state u; and the measurement space H* where the data y and the observation error 
n belong. The inner product between two arbitrary elements u and v in the apace H^ Is 
denoted by <u,v> i or fry the simpler notation u*v * <u,v> i - Similarly, uv* denotes a 
Hilbert space outer product. 
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Conversion to Integral Operator Model 


It is convenient for subsequent developments to convert (1.1) to sn eqoivelent 
integral operator formulation, To this end, define the Green'* function $(x/£) of A as 
the solution of 

A x <Kx/C)*6<x-C), (2.1) 

where 6 is the impulsive delta function, and where the subscript x in A^ denotes that 

the spatial differentiations embedded in A are performed with respect to x (as opposed 
to being performed with respect to £)• Define then the integral operator <t> whose 
kernel is the Green' $ function, i.e., 

4>v = <J>(x/£)v<£)d£, (2.2) 

for all admissible functions v. Note that <t> is the integral operator such that A<t> = I, 
where I is the appropriately dimensioned identity. 

With these definitions at hand, it is possible to recast (1.1) and (1.2) as 

y = m(0) + H(0)4>(0)B(0)u + n, (2.3) 

where m(6) is the “suspected mean* 

m(0) = H(0)<t>(0)C(0)f. (2.4) 

Equation (2.3) can be cast into the following even more compact notation 

y = m(0) + h(0)c, (2.5) 

where c = [t»,n] is the model error vector [1], and h(0) is the operator h(0) * 
U«0XJ>(0)B(0) 1 1 ). 

Predicted- Data Mean and Covariance 


The evaluation of the predicted mean and covariance of y, needed as a preliminary 
step to arrive at (1.3), is based on the key assumption that the model error vector c * 
[o,n] is a zero- mean spatially distributed white-noise process whose covariance 
operator E(ee*) is the identity, i.e., 

E(££*) = I, (2.6) 

where I is the appropriately dimensioned identity. Note that thir assumption is not at 
all restrictive, because the more general case where the model errors c » [o>,nJ are 
correlated (with a nonidentity covariance operator) can be handled within the same 
formulation hy selection of the operator B in (1.1). It is assumed here that BB* is 
bounded and trace-class, with kernel b(x/{) satisfying Jq b(x/x)dx <°°. 
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Remark 2.1 


The process y is s random field with mean and covariance specified as 


E(y) = m(0), E{(y-m(6)] [(y-m(e)J*} * I+R(0), (2.7) 

where R(0) = H(0)d>(0)£(0)B*(0)O*(0)H*(0). That E(y) * m follows from (2.5) and the 
fact that c i* zero-mean. The second of Eqs. (2.7) follows from the following sequence 
of operations: B((y-m)(y-tn)*] = E(bee*h*) = hE(ee)h* * hh* = I+R. A more detailed 
development of the above result is contained in Ref. [1]. 

Remark 2.2 The process u in (1.1), representing the state of the system, is a random 
field with mean and covariance specified by 

E(u) = <DCf, E((u-<J>Cf)(u-<t>Cf)*J » R(0), (2.8) 

where 

R(0) = <t>(0)B(6)B*(0)<t>*(0). (2.9) 

Note that the state covariance R and the "observed-state” covariance R in (2.7) and 
(2.8) are related by 

R(0) = HR(0)H*. (2.10) 

Pemark 2 .°. The state covariance R satisfies the partial differential equation 

ARA* = BB*, (2.11) 

a result which can be established by pre- multiplication of R in (2.9) by A and 
subsequent post-multiplication by A*. 

Remark 2.4 The state covariance operator $ can be represented as the following 
integral operator 

Rv = r(x/f) v(f)df , (2.12) 

where the kernel r(x/£) satisfies 

A x r(x/f)A f » b(x/f), (2. IS) 

and where b(x/£) is the kernel of BB*. This result can be established by means of the 
following sequence of operations. Consider an admissible function v (admissible in the 
sense that it can be operated on by the operators AfiA* and BB* in (2.11) so that 
ARA*v = BB*v makes sense). In terms of the corresponding kernels r and b, this last 
equation becomes 

A x { J Q r(x/C> [A f *v(f)Jdf} - J Q [A x r(x/c>A f ] v(f)df . J ft b(z/C>v(f>dC, (2.14) 

where the first equality is valid because by definition A^* is the formal adjoint of A^. 
Since Eq. (2.14) must be valid for all admissible v, then (2.14) implies (2.13). 
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Remark 2.5 The state covariance kernel r in (2.12) can be expressed as 


tix/O = I Q J Q «b(x/n) wve) <*8/E> dude, (2.15) 

where <t> is the Green's function of A, and b is the kernel of BB*. This result can be 
established by expressing (2.9) in terms of the operator kernels ♦ and b of $ and BB* 
and by subsequent reversal of the order of integration. 

PwmjrV i t In the special case, of interest in many applications, where the process 
error <■> is discretely located at the M locations and the sensors are placed 

at the N locations [f^,... then R * H<t>BB*<b*H* is a matrix whose general element 

R^. is specified by 

M 

R..= l ♦(x i /n l )<Xu l /f J ), (2.16) 

1*1 


where the summation is taken over the disturbance locations. 

The Dual-Model Covariance Operators 

Closely related to R and R above are the "dual" operators defined as 

Q(O) * B*(exb*(e)H*(e)H(exwe)B(0), q (©> . weiiwemoywe). (2.i7> 

Note that Q and Q can be obtained, from R and R respectively, by making the 
substitutions <h-*b* and B-»H*. This observation can be used as the basis for defining the 
dual system model, illustrated in Fig. 2.1, whose state and output have covariance 
operators specified by Q and Q above. 
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Figure 2.1. Illustration of Primal and Dual Models 
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The primal system model is based on (2.3), with a, u and Hn denoting the process error, 
the system state, and the observed state respectively. For this model, 5 » B(uu*) is the 
covariance of the state, while R » HRH* is the corresponding covariance of the 
observed state. It is assumed for the sake of this discussion that the deterministic input 
f in (1.1) has been set to zero, so that the suspected mean m in (2.3) is zero also. With 
this assumption, it is not necessary to show m in the block diagram in Pig. 2.1, and the 
relationship between the primal and dual models is illustrated more easily. The dual 
system model is characterized by the dual operators 9*, d>* and B*, by the dual or 
adjoint state X, and by the observed dual state B*X. It is assumed that the dual model 
is driven by a unit- covariance white-noise process so that Eton*) * I. This input process 
n driving the dual model can be thought of as being the observation error process in 
(2.3). Por this dual model, § * B(XX*) and Q « B(B*XX*B) are respectively the 
covariances of the state X and the observed state B*X. Upon multiplication of X in 
Fig. 2.1 by A*, the following partial differential equations result to describe the dual 
model 


A*X = H*n. (2.18) 

Note that the dual state covariance Q * and the dual observed-state 

covariance Q = B*0*H*Hd>B are related by 

Q . B*QB. (2.19) 

In the same spirit used to arrive at (2.11M2.16), it is now possible to develop the 
following properties of the operator 0 and its corresponding kernel q. 

Remark 2-7 The dual-state covariance operator (J satisfies 

A*QA » H*H, (2.20) 

a result which can be obtained from Q * <b*H*H<b in (2.17) upon multiplication 
by A*( *)A. 

Remark 2.8 In terms of its kernel q- the operator Q can be expressed ss 

Qv - X Q q(x/£)v(C)dC, (2.21) 

where q satisfies the differential equation 

A^q(f/x)A x = h(£/x), (2.22) 

and where h(£/x) is the kernel of H*H. This result can be established by an approach 
quite similar to that used in arriving at (2.13). The symbbl v denotes again an 
admissible function defined to be admissible if (2.21) makes sense. 

2.9 The kernel q(?/x) of 0 can be expressed as 

q(£/x) - X Q f Q h(Tvtf) <*X/B) du dfi, (2.23) 
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where h is the kernel of H*H. This resole can be established in s meaner analogous to 
that nsed in arriving at (2.15>. 


Remark 2.10 In the special c- <e most typical in applications, where the range of H 
and B is finite-dimensional, the. Q » B*<b*H*H<bB is a matrix whose general element 
Q„ can be expressed as 

N 

Q i)“ £ q(x } /T V’ <2.24) 

k-l 

where the summation is taken over the set of sensor locations. 

Spectral Repre s entations 

Recall that R * H$BB*6*H* is the observed-state covariance operator in (2.7). 
The eigenvalues of R are defined as the nontrivial solutions of 


R<P k « X’dy (2.25) 

with 4> k being the corre sp onding eigenvectors. Note that, far cases where H has a 

finite-dimensional range, the operator R is an N-by-N matrix with a finite number of 
eigenvectors. In the more general case where the range of H is infinite -dimensional, 

then R is usually compact and X^-*0 as b*». In both of these cases, the following 

Mercer expansions hold for R and its kernel r 

R » Z ♦♦ and t(x/C> - E Xjg^xttJlf) (2.26) 


Furthermore, the normalized eigenvectors form an orthonormal basis for the 
observation space H^. This implies that E $^4^* * 1, where I is the identity in H^. 


Closely related to the basis $ k above are the dual vectors defined as 


♦ k * X~ l B*d»*HN»j., (2.27) 

which can be viewed as the result of applying an in put to the dual system model 
(2.18) and then "balancing* the output by dividing by the eigenvalue X^. 

Remark 2.11 The vectors P k defined by (2.27) are the eigenvectors of the dual 
observed-state covariance Q » B*<b*H*H$B, i.e., 

Q* k • X k P k . (2.28) 
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This result can be established by premultiplication of (2.27) by H<t>B and use of the 

condition R4> k - X*<|> k . Note that, if the dimension of the input space Hj is greater 

than that of the output space H^, then the do not span the input space. They do, 

however, span the range subspace of the operator B*4*H*. Consequently, they r«mnt 
be used to expand vectors in the null space of H$B. 

Remart 2.12 The vectors * k are also related to $ k by the equation 

4> k * X~ l H<t>B* k , (2.29) 

a result that can be obtained from (2.27) upon p re multiplication by the operator Hd>B 
and use of the condition R$ k * 

Remart 2 1 $ The dual-state covariance operator Q and its corresponding kernel q 
can be expressed as 

q = i 7 -^kV q a E (2 m 

a set of equations which are analogous to (2.26). This result can be obtained from the 
observation that Q = B*<b*H*H<t*B » B*<b*H* (E HOB « E Use has 

been made of the condition E 4> k <t> k * » L 

The vectors 4> k span the observation space H^. While the vectors ^ k do not 
span the input space Hj, they do span the range of B*$*H*. So far, no attempt has 
been made to obtain vectors that can be used to expand functions in the state space 
or in its dual space H^*. To this end, define 

Xj^-X” 1 $Bt k , F k - X~ l d>*H*$ k . (2.31) 

The vector x k is in the state space whereas the adjoint variables p k are in th* dual 
space. In general, neither one of these two vectors however spans the state space x'j. 

Remart 2.14 The vectors x k and p k are orthonormal with respect to H*H and BB* 
respectively, i.e., 


v H *«v 

* 0, P k *BB*p m ■ 0, 

k v m, 

(2.32) 

V H,Hl k 

» 1, p k *BB*p k . 1. 


(2.33) 
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These results can be established by the following sequence of operations: Xj_*H*Hx^ ■ 

(Hx^)*H* m * 4nii • Since ^ and are 

orthonormal, then (2.32) and (2.33) follow. 

Remark 2. IS The vectors x^, ^ and p^, ij, are related by 

4> k = Hx^ and * B*p k - (2.34) 

This result follows readily from the definitions in (2.27), (2.29) and (2.31). 

Remark 2.16 The vectors x^ and p^ satisfy the boundary-value problem: 


A 

” 

0 


X 


0 

BB* 


X 




k 

1 




k 

0 

A* 


P 

X 

H*H 

0 


p 

NX 

1 

J 


Lk J 

k 




L k J 


(2.33) 


This result can be established by operating on x^ in (2.31) by A and on p^ by A* to obtain 


Aij, » X^, 1 By k and A*p^ « > H*#^. 


(2.36) 


Then, substitution of (2.34) in (2.36) implies (2.35) 

Note the similarity between this problem •. 4 those traditionally encountered as 
necessary (and at times sufficient) conditions for optimality in quadratic optimal 
control and estimation problems for linear systems. 

The Fredh olm ^es ohrgnts of the Covariance Operators O tnd R 

The Fredholm resolvent of R is defined as that integral operator such that 
(1+ R) -1 = I-P, a relationship which immediately implies that 

R = P 4 RP and R * P 4 PR (2.37) 


In terms of the corresponding kernels r and p, these equations become 

r(x/C) * p(x/() 4 Xq r(x/T|)p(V?)dT) (2.38) 

for the case with continuously distributed data. In cases with discrete data, R and P 
are matrices whose general elements R. and P. are related by 


N 

*k,m 3 *k,m + ^ ^k,n^n,m" 

n*l 


(2.39) 
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In both of these equations (2.38) end (2.39), the unknown is the Fredholm resolvent P, 
whereas the observed-state covariance kernel R is known. 


qamertr 2.17 The integral operator R and its Fredholm resolvent P commute. This 
result, which can be stated as 

RP = PR, (2.40) 

is a direct consequence of (2.37). 

Remark 2.18 Equations (2.37) also imply that 

P =(l4R) _l R * R«+R)“ l , R = (I-P)~ l P « P(I-P) -1 - (2.41) 

Remark 2.19 In a manner analogous to (2.37) - (2.41), it is possible to define the 

resolvent S of the dual-state covariance operator Q by the relationship (UQ)~ l * I-S 
which implies 

Q = S ♦ QS, Q = S + SQ, SQ * QS, (2.42) 

and 

S = (!+Q) —1 Q - Q(I + Qf\ Q » (l-Sf X S « S(I-S)“\ (2.43) 

Remark 2.20 The Fredholm resolvents P and S can be expressed as 

P = E (\’/(U\’)] 4> k 4> k *. S» Z IXj/(UX k )Nr k t k *. (2.44) 

p(x/?) = E [X k /(1+Xj)l 4> k (x)4> k (f), s(x/*)« E [X^(l+X^Hr^x)t k (C). (2.45) 

These expansions can be established by substituting (2.26) and (2.30) into (2.37) and 
(2.41). 

Trigonometric Operator Forms 

Remark 2.21 The predicted-data- covariance operator (UR) can be expressed as 

I 4 R s I 4 TAN a a - SEC a a, (2.46) 

where TAN a a and SEC a a are the operators 


TAN a a » £ tan 2 a k 4> k <p k * . R, 


SBC 2 a ■ E se^a^k^k*’ 


(2.4?) 
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and tan is defined by tan « X^,. Note also for later xeference that 

(I+R) ^ » SECa * £ seca^ 4 > k 4 > k * , R ^ ■ TANa ■ £ tana^<J>^ 4 >^* . (2.48) 

Proof; Recall that tan ci g £ thereby establishing (2.47). Use 

of the formal expression I = E <|> k <l> k *for the identity I implies that (I+R) « E (l+tan a a^) 
^k^k*’ wllich le * d# to ^2.46). Equations (2.48) are obtained from (2.46) and (2.47) by 
performance of the square-root operation. 

Remark 2.22 Ihe Fredholm resolvent defined as P ■ I - (I+R) -1 of the covariance 
operator R can be expressed as 

P * SIN a a, (2.49) 

where SIN a a is the operator defined by the expansion 

SIN 2 a = £ E [X a /(l+X a )M> k 4> k *. (2.50) 

Proof; This result can be established by substitution of tan 2 ^ « in (2.44). 

R*m*rk 2.28 Equations (2.47) and (2.48) together imply that 

P = Rd+R) -1 - TAN a a[I+TAN a a]“ l - TAN a a {SEC a a] — 1 - S!N a a, (2.51) 

a trigonometric operator identity that ran be viewed as a generalization of a imilar 
identity involving scalars. 

3. DERIVATION OF THE LIKELIHOOD FUNCTIONAL 

Based on the results of the previous section, it is now possible to derive the 
likelihood functional in (1.3) to be minimized. Since the development required to 
achieve this is fairly lengthy, it is convenient to summarize in adivance the pivotal steps 
involved in the derivation: 

• the integral operator model y ■ m ♦ H6Bo + n in (2.3) is first coverted into an 

equivalent "spectral* form y k « + \ k » k + Uj,* where y k « 4> k *y, « k » 

n k ■ 4> k * n are the corresponding spectral coefficients. 

• the spectral coefficients y k of the data y are a sequence of independent 
Gaussian random variables with mean B(y k ) > m^, covariance - 1 + X a and 
probability density P k (y k ;9) ■ n~ exp [-(y^m^/Ra^] 
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• * “finite-dimensional* likelihood ratio is then defined as the product of a 
finite number N of terms involving the probability densities p^(y^;0) above. 

• an "infinite -dimensional" likelihood ratio is obtained by letting the number N 
of spectral coefficients approach infinity. The related negative-log 
likelihood functional in (1.3) is obtained by taking the negative logarithm of 
the functional that results from the limiting process. Of course, in cases 
where the data is finite-dimensional (obtained by means of a finite number of 
discretely located measurements), the limiting process Involved in this last 
step is not necessary. In this case, the "finite - dimensional" likelihood 
function obtained in the previous step is the function to be minimized to 
obtain the parameter estimates. 

The remaindc of this section contains a more detailed derivation of the foregoing 
results. 

Recall that 

y = m + H$Bo 4 n, (3.1) 

where m * H4>Cf and f is the input. As outlined above, the first sv~p toward evaluating 
the likelihood function is to convert (3.1) into an equivalent “spectral" form by using he 
eigenvectors <f> k and i.e., 

y - E y k <t> k . «■> * E « k * k . n - £ n^, m E nyjy (3.2) 

Substitution of (3.2) and (3.1) and premultiplication of (3.1) by <J> k * leads to 


Vk*V 


(3.3) 


Result 3.1 ^ and n^ are independent Gaussian random variables with mean and 
covariance given by 

MEAN COVARIANCE 

E(o k ) = E^) * 0 B(y*) » 1 + 


E(y k ) * n^ E(« k ) “ * 1 

E(7. 7 ) * 0 m*k 
t m 

where y. » y k - m^. Hence, y k is a sequence of independent Gaussian random variables 
with mean m^ and covariance 1 4 X k> 
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M 

Let y * [yj.—.yjj] be an N-dimensiorul vector consisting of the first N spectral 
coefficients y.^ of the data y. Because y^, are independent Gs. a random variables 

with mean and covariance o£ » 1 ♦ \*. their corresponding probability densities 

P k (y k ;0) » n _/l c7~ l exp (-y*/2a*) can be multiplied to obtain the probability density 

ltf N 

p(y ;9) of the composite N- dimensional vector y , i.e., 


N 

N 



p(y N }0) = n 

p k (y k J0> * n 

o^if^ezp 

(3.4) 

k.l 

k-1 




In order to obtain a likelihood functional for the identification problem with the 
function-space process y as the data, it would be desirable to set N-*°° and obtain what 
would be in the limit a probability density functional (PDF) for the vocess y. 
Unfortunately, this limit may not exist because the right side of (3.4) may not converge 
as and consequently a PDP for the process y cannot be defined in this manner. 

However, this can be circumvented by dividing by 
N 

p o (y N s 0) . R tr“ Vl exp (-yJ/2). (3.5) 

k-1 

This results in 

N exp [-(y -m )*/2(l+\ )] 
k k k 

A(y N ;0) « n , (3.6) 

V* 

k»l a a 

(1*\ ) exp (-y /2) 

k k 

N 

which can be viewed as a likelihood ratio consisting of the PDF of the process y with 

the "signals" and m^ nonzero, divided by the similar PDP of y^ with the signals 

and rrij, set to zero. The term likelihood ratio used to describe (3.6) is consistent with 

terminology common in the theory for detection of Gaussian signal! in additive 
Gaussian noise [3). 

Although the limits of p(y^}9) and P Q (y**t6> appearing respectively in (3.4) and (3.5) may 

not exist when taken independently, the limit of their ratio in (3.6) is a well-defined 
quantity specified by 
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exp[(-Vj) (y-m)*(I+R)~ ^(y-m)] 


(3.7) 


\<y,Q) = Urn A(y**;0) 

[det(J+R)J V * exp[(-V*)^y] 

This is the desired expression for the likelihood ratio that the maximum-likelihood 
method seeks to maximize . It can be interpreted as the likelihood ratio for the 
detection of the "signal" m ♦ K<t>Bu in (3.1), in the presence of the noisy Gaussian signal 
n. Instead of maximizing A(y;6) directly, it is more convenient to minitnig* the 
negative-log likelihood functional defined as J(6;y) * -log [A(y>G)], or, more explicitly, 

J(©jy) = */a log det [I+R(0)1 + */« (y-m(0)]*[I+ R(0)] -1 [y-m(0)l - '/* y*y. (3 8) 

Note that for the special case with no deterministic input. ,*m=0 in (3.1), and the 
negative-log likelihood in (3.8) educes to 

J(0;y) = V. log det [I+R(0)J - V. y*P(0)y, (3.9) 


where P(0) * I - (I+R(0)] 1 is the previously defined (in Sec. 2) Fredholm resolvent uf 
the predicted-data-covariance operator R. 

The first term in both of these last two equations can be cast into an equivalent and 
somewhat more convenient form in use of the identity [4] 

log det [I-f R(0)] = Tr log [I + R(0)]. (3.10) 

Substitution of (3.10) in (3.9) leads to 


J(0;y) * V* Tr log [1+ R(0)] + V* [y-m(0)]*ll+ R(0)f x [y-m(0)} - V* y*y, (3. 1 1) 

which has been recorded previously as (1.3) and constitutes the central aim of this 
section. 

Reorientation 


The method of maximum likelihood, as defined here, results in estimates that 
minimize J(0;y) in (3.11). This minimization problem can be viewed as a function-space 
nonlinear programming problem subject to the system model constraints that 
R(0) = K(0)<t>(6)*(0)B*(0)<t>*(0)H*(0) and m(0) » H(0)4>(0)C(0)f. Since no closed-form 
solution to this problem exists, it is necessary to use .numerical methods for 
optimization. However, there exist alternative formulas for the likelihood ratio that 
are more convenient to use in the implementation of the numerical methods. Such 
formulas are developed in the following section. 
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4. ALTERNATIVE FORMULAS FOR THB LIKELIHOOD FUNCTIONAL 


V, Tr log IUR(0)] 4 ‘/, ly-m(0)]*H+R(e)} ‘ly-m(0)] - V. y*y 

BASIC 

V. Tr log [UR(0)J ♦ V, Iy-m(0)j*[y-Hu o (0)J - */,y*y 

SMOOTHING 

*A L llog(UX a ) 4 (l4X*)~ 1 (y i _-m^) a -y£] 

SPECTRAL 

Tr log [UK(0)J 4 */* z*(0)z(e) -z*(0)y 

SQUARB-ROOT 

PILTBR 

Tr log [SBCa(0)] 4 V. z*(0)z(9) -z*(0)y 

TRIGONOMETRIC 

OPERATOR 


In the above table, the basic formula is expressed in terms of the suspected mean m and 
covariance UR = I + of the data y. The smoothing form is specified in 

terms of the optimal smoothed estimate u Q * E(u/y), representing the conditional mean 

of the state u given the data y. The spectral formula is obtained by substitution in (1.3) 

of the eigensys^em expansions R * EX a y • Ey^$^, and m * where X® 

and <t>^ are the eigenvalues and eigenvectors of the observed-state covariance operator 

R. The squ are-root filter formula, previously recorded in (1.4), is based on the 

y. «/* 

factorization of the predicted-data- covariance operator as (UR) » (UR) (UR) and 

-i % 

on the definitions z * Ly 4 (I-Lhn and (UK) * (I-L) ■ (I+R) . Finally, the 

trigonometric operator formula is obtained from the square-root filter expression by 

use of the identities UR • SBC*a and L • I- CO Sc developed in Sec. 2. 

Although the derivation of the above expressions leads to significant insight about the 
structure of the likelihood functional, it is not within the scope of the paper to 
inver'igate all of these alternatives to the same level of detail. The formula involving 
tut predicted-data- covariance square-root filter appears to be the most convenient to 
implement the numerical search for the optimal estimates. This section, however, aims 
to first develop the results summarized above. 

Formulas Based on the Opti cal Smo othed Stat * gjtiBMr s& 

Result 4.1 The negative-log likelihood functional can be t pressed as 
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(4.1) 


1(6*) « V* Tr log (UR(0)J ♦ V. [y-m(0)]*[y-Hu o (0)J - */* v'y, 

u (0) = G(0)y + [I-G(0)H]<b(0)Cf, (4.2) 

c 

G(0) = R(0)H* H+HR(0)H*r l « (4.3) 


where u q = £(u/y) is the conditional expectation of the state u given the data y, and G is 
the estimator gain. 

Proof: It will be shown in Sec. 8 that n in (4.2) is the conditional mean and that G in 

o 

(4.3) is the corresponding estimator gain. Therefore, for the sake of the discussion 
here, assume that (4.2) and (4.3) are valid. Multiply in (4.2) by H and use (4.3) in the 

resulting equation to obtain 

Hu o = HGy + (I-HGhn, (4.4) 


and 


y - Hu = (I-HG) (y-m), (4.5) 

where m * HOCf is as before the suspected mean of the data y. However, recall the 

identity HG = HRH*(UHRH*) -1 - I - (UHRH*) -1 so that I-HG - (I+H5H*) -1 - (UR) -1 . 
Hence, substitution of this last identity in (4.5) leads to 

y - Hu * (UR) -1 (y-m). (4.6) 

o 

This is the central result required to establish the equivalence between (4.1) and \3.11). 
To this end, substitute (4.6) into the second term on the right side of (4.1), and observe 
the equivalence with (3.11) by inspection. 

Result 4.2 The negative-log likelihood functional can be expressed as 

J(0;y) » V. Tr log IUR(0)] + V* IB(u)J* H*H (B(u/y)] - V, lB(u) + B(u/y)]*H*y (4.7) 

where E(u) » OCf and B(u/y) are respectively the unconditional and conditional expected 
values of the state u. 

Proof: This result can be established as a corollary to the Result 4.1 by co mb ini n g 

the last two terms on the tight side of (4.1) and use of the equation m * Hd>Cf. 

Both of these results express the likelihood functional in terms of a quantity u q in (4.2) 

which is the conditions, expectation B(u/y) of the state given the data y. This quantity 
is also known to be the best linear mean-square estimate as well as the optima) 
least- squares estimate. The coincidence of the best mean-square estimate and the 
optimal least-squares estimate, both of which can be computed by the conditional 
expectation formula (4.2), is explored at length in Ref. (1). 
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Remit 4.3 The negative-log likelihood functional can be expressed as 


J(0;y) = V* £ [logfWX*) 4 (I4X*) 1 (y^-m^) a -y^], (4.8) 

where y^ = <J>^*y and * <t>^*m are the spectral coefficients of the data and the 

suspected mean m, and X* are the eigenvalues of R. By substitution of X^ - tana^ in 
(4.8), this equation can be cast as 

J(0;y) = V* £ [log (sec*^) 4 cos’a^ (y^ - m^-y^J. (4.9) 


Proof: Equation (4.8) can be established by taking the negative log of A(y ;0) in 

(3.6) and letting N-*». Use of the identity X^ * tano^ in (4.8) leads to (4.9). 

Result 4.4 The negative-log likelihood ratio can be expressed as 

J(0;y) * £ [V* log (WXj) 4 V. zj - z^jJ, (4.10) 


where 


Zj, * Lj, y^ 4 (l-L^lm^ and » 1-cosa^. (4.11) 

Proof: Define the "residual* process 

e^ » y^ - Zj,, (4.12) 

as the difference between the data y^ and the "filtered* estimate z^. Observe that e^ « 
cosa^/y^-m^) by substitution of (4.11) i=. (4.12). Substitute this last equation into the 
second term on the right side of (4.9) to obtain (4.10). 

The formula for the likelihood functional in (4.10) can be viewed as the "spectral" 
version of the predicted-data- covariance square-root formula described below. 

Predicted- Data- Covariance Square- Root Formula for y^jfej oi Functional 

Result 4.5 The negative-log likelihood functional can be expressed as 

I(0iy) - Tr log [I4K(0)J 4 V. z *(0)z(0) - z*(0)y, (4.13) 

where 

z(0) - L(0)y 4 [I-L(0)]m(0), (4.14) 

with L(0) and K(0) defined as 
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L(0) = I - [I+R(6)]' V *, K(0) « fUR(0)] V * - L (4.15) 

Proof : Conversion of the first term (1/2) Tr log [UR(0)j in (3.11) into Tr log H*K(0)] 

follows because (1/2) Tr log(UR) * (1/2) Tr log (I*K) a « Tr log rt+K). Conversion of the 
last two terms on the right side of (3.11) into the desired form in (4.13) follows from the 
identity 


(y-m)*(UR) 1 (y-m) « [(I-L) (y-m))*|d-L) (y-m)] » (y-x)*(y-z), (4.16) 


where use has been made of the fact that (UR) 1 *• (I-L*) (I-L). 

Result 4.6 The operators L and K can be represented in terms of the following 
eigensystem expansions: 

L = Z (l-cosoj,)^^* K * £ (seco^ - 1) 4> k <p k *, (4.17) 

where ol^, = tan l \ k , and 4> k are the eigenvectors of R. 

Proof: Let 


L = Z L k 4yt> k * with 1^ - 4> k *L4> k , (4.18) 

and then evaluate the as yet undetermined coefficients Lj, from L * I - (I+R) - ^* in 
(4.15). To this end, premultiply L in (4.15) by 4> k * and postmultiply by 4> k to obtain 
a -*/i 

L^ -- 1 - (1+Vj.) * 1-cosa^, which is the desired result. 

Similarly, to obtain the desired expansion for K, seek to determine the coefficients Kj, 
in 

K = £ K k 4> k 4> k * with ^ - <b k *K 4> k . (4.19) 

Multiplication of K in (4.15) by 4> k * and 4» k leads to K k » 4> k *K<J> k * (l+X*) ^-1 « 

seca k ~l. 

Trigonometric Operator Formulas for the T-iVaHhnod Functional 
Result 4.7 The log- likelihood functional can be expressed as 

J(0?y) - Tr log [SEC^eiJ ♦ */. z*(0)z (0) - z*(0) y, (4.20) 


where 
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*(9) > (I-C0Sa(6))y + COSa(9)m(0), 


(4.21) 


with COSa(0) * [UR(0))~^* * Z cosa^^* (4.22) 

Proof: Recognize that (4.17) is^lies that 

L(0) - I-COSa(0) and K(0) * SBCa(0) - 1, (4.23) 

and use these identities in (4.13) and (4.14) to obtain (4.20) and (4.21) respectively. 
Result 4.8 The negative-log livelihood functional can be expressed as 

J(0;y) » Z [log seca^O) ♦ V. z£(0) - ^(0^(0)!, (4.24) 

where and y k are the “spectral" coefficients 

z fc (0) = 4> k *(0)z(0), y k (0) * (4.25) 


and as before o k = tan x \^,, with being the eigenvalues of R. 

Proof: This result, which is closely related to Result 4.4 above, can be established by 

observing that SECa(0), z(9) and y in (4.20) can be ezp a n d ed as 

SECa(0) = £ seco^ * * Z 7 * Z (4.26) 


Selection of Preferred Formr * 1 * for S earch ha bmuutiop 


In principle, all of the above formulas for the likelihood functional J (©jy> can be used as 
a point of departure to compute the gradient dJ/39 and the corresponding Hessian 

d 2 J/d0 3 - and to thereby obtain the necessary ingtedients to implement the 
Newton- Raphson search for optimisation. The calculations involved in the numerical 
search can vary significantly, however, depending on which of the forms is used as a 
starting point. It is therefore of interest to conduct a detailed investigation of the 
relative advantages and disadvantages of the various methods to implement the search 
that arise from the various forms of the likelihood functional. Such an investigation is 
currently in progress and will be reported on in future work. In this paper however, the 
formula selected to compute the gradient and Hessian is that based on the 
predicted- data- covariance square-root filter in (4.13). 


5. 


COVARIANCE BIGBNSYSTRM SENS! 


!U f i? 


TO SMAT.T. PAP AMRTRP CHANGES 


As a preliminary to the evaluation of 3J/30 and d s J/38 2 involved in the numerical 
search for optimization, it is necessary to conduct an analysis of the perturbations 6\ k 


and 64> k of the eigenvalues and eigenvectors of R • H<hBB*4>*H*, with respect to 
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variations 66 of the parameter distribution 6. Such an analysis will provide the 
mathematical tools that will be used in subsequent sections to evaluate dJ/66 and 

0 J j/ae 2 . 

By definition! and ^ are the nontrivial solutions of 

R (0)^(6) = X^e^e), (5.1) 

where the dependence on 6 of R, 4^ and Xj, has been explicit. The ultimate objective 

of this section is to develop analytical formulas for calculating the first-order 
perturbations and 6<j> k of and 4> k with respect to small changes 66 in the 

parameter distributions 6. 

Definition of 6X^, dX.^/36, 6d> k and 6d^/36 

It is assumed here that the Frechet differential [2] of X^, at 6 exfsts and that it can 
be computed by 

6\ k (6;66) * JdX k (6+Y6e)/dTl Y=0 , (5.2) 

where y is a scalar and 66 is an admissible perturbation of 6. Bquation (5.2) is actually 
the formula typically used for computation of the Gateaux differential. However! it is 
assumed here that both of these derivatives exist and coincide and that therefore (5.2) 
can be used to calculate the Frechet derivative. 

Since X k is Frechet differentiable (admittedly by assumption, as an investigation of 

the technical conditions required for differentiability is not within the scope of this 
paper), its differential 6\ k (6;£8) can be expressed as 

6\ k (6;66) - iaX k (6)/aej6e, (5.3) 

where c X k (6)/d6 is a bounded linear functional referred to as the Frechet derivative of 
\ k ai 6. The transformation 3\ k /38 can also be viewed as a function space gradient of 
\ k at 6. Similarly, the eigenvector differential 6$ k (d;£'') is defined as 

6q> k (6;66) - I3d> k (6;66)/36] 60. (5.4) 

where (3q> k (6)/38] is the Frechet derivative, assumed to be linear and bounded. 

Calculation of 6X^ and 3X k /36 

Recall that the $ k in (5.1) are orthonormal so that 
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0 , 


m*k. 


(5.5) 


4> k *4> k * 1 and \*4> m * 

Multiplication of (5.1) by $ k *and use of 4> k *4> k » 1 leads to 

X k * ♦r** 4 V <5.6) 

which can be taken as the point of departure for calculation of 5X> k and dX^/39. 

Result S.l The Frechet differential 6)^, (6:56) can be expressed as 

6\ k (6:5e) . -\J(6> lp k *(e)6A(©:6e)x k (0)), (5.7) 

where 6A(9;60) is the differential of A defined as . 


5A(6;50) » [dA(0+Yde>/dYl YxO (5.8) 

and p k and are the vectors defined as p k * \ k 1 <h*H^4> k and ■ X~ l <5B^r k in Sec. 2. 

Proof; Performance of a first-order perturbation on (5.6). and use of the condition 
4> k *6<J> k * 0 leads to 


6\ k = (2\ k ) _l <t> k *6R<t> k , (5.9) 

where 5R(6:56) =* (dR(©4Y50)/dYj evaluated at y * 0. However, since 5R * 
6(H4>BB*54>*)H*, then 

5S » H(64>)BB*4>*H* 4 H4>BB*(64>*)H*. (5.10) 

It can be observed from (5.10) that evaluation of 6$ is the central calculation 
required to determine 55. In order to simplify notation, without loss of generality, it 
has been assumed in arriving at (5.10) that B and H do not depend on 6. In most 
practical cases, this assumption is satisfied because the poorly known parameters occur 
in the operator A. 

To compute 6<h, as required by (5.10), recall that A(0)<b(9) • I, so that (6A)4> 4 
A(64>) s 0, and 

6<|) * -4>(6A)4>. (5.11) 

Substitution of (5.11) in (5.10) leads to 

6 R « - H4>(5A)4>BB*4>*H* - H4>BB*4>*(6A)*4>*H* (5.12) 

Multiplication by 4> k *(* )$ k results in 
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(5.13) 


= - <4> k *H<t»6A«l>BB*<t>*H*4> k ) - ($ k *H<DBB*$*)6A*(<D*H*4> k ). 

Finally, use of the definitions p fc = X” 1 <t>*H*4> k and = X” l <Mfcp k in (5.13), and 

substitution in (5.9), implies (5.7). In performing this last step, it has been assumed that 
A = A* is formally self-adjoint, a condition that is valid on most problems of practical 
interest. 

Discussion and Additional Assumptions on A 

The above result, although a step in the right direction, is still somewhat 
intermediate because the differential 6X k in (5.7) is expressed in terms of the yet to be 

determined differential 6A. To proceed further, it is convenient to make two additional 
assumptions (typically satisfied in practice): 

• A(0) is linear in 6 so that A(0 ^ + 0^) * A(0 j) ♦ A(0)^ for two admissible 
distributions 0^ and 0^. 

• A(0) can be factored as A(0) * D*(0)D, where D and its corresponding formal 
adjoint D* may in general be matrix differential operators. 

Based on these assumptions, it is now possible to derive the following more explicit 
formulas for 6X^ and 3X^/3©. 

Result 5.2 The Frechet derivative (0)/30 of ^ is 


3X k (0)/30 = Xj Dp k (0) • Dx k (0). (5.14) 

Proof: Since A has been assumed to be linear and factorable 

6X k = -X^p k *D*(60)Dx k - Xj <(Dp k ) 60 (Dx^ (5.15) 

where the last equality is a consequence of a process analogous to integration by parts. 

Result 5.3 Since 3X^/30 has been assumed to be a bounded linear functional in X, 
it must be expressible as 

[3X k (0)/30]60 = <3X k (0; • )/30, 60> x , (5.16) 

where 3X^_(0; • )/30 is an element of X*(Q). Futhermore, I3X k (0{ *)/30] can be evaluated 
from 

3X k (0,x)/30 * X“ Dp k (0;x) • Dx^Oix). (5.17) 
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Proof : The rigorous derivation of this result is not as yet available. The result is 

accepted somewhat formally on the basis that a bounded linear functional can be 
represented by an element in the dual to the space in which the functional is defined. 

Calculation of 6<b^ and 3d>^/88 

Result 5.4 The Frechet differential 64 ^ (0»60) of 4^, can be expressed as 

6<|> v (6;6e) = £ ((d> * 6 R 4 >.)/a* - \*)] A , (5.18) 

if m k c m * m 

m*k 

where 6R is the differential of the observed- state covariance operator R. 

Proof : Since R4> k = X^4>^, 


(6Ei<p k + R64>j. = * \j 64 y (5.19) 

Now, seek an expansion for 6$. in terms of the orthonormal basis $ , i.e., 

K m 

‘-V E 'tanV 'tan ■ ts m 

m*k 

where c. are scalar coefficients to be determined. Note that the orthonormality of 
km 

4> k implies that c^ = 0, so that 6$ k does not have a component in the direction of 4> k - 
To evaluate c^, premultiply (5.20) by 4> m * to obtain 

O^* 6 **™ ♦ 4> C5.21) 

m mm K K m x 


Use of the conditions $ *R 

m 

leads to 


\ 2 4> * and c. . 4> *64>. and rearrangement of terms 

id in cm m c 


c km ‘ < V 4t V /,X i - X m'- <5 ' 22) 

Substitution of (5.22) in (5.20) leads to (5.18), thereby establishing the result. 

Equation (5.18) is similar in nature to (5.9) in that it expresses the desired 
differential in terms of the yet to be determined quantity 6R. 

Result 5.5 The Frechet differential 64 ^, (0$60) of 4 ^ can be expressed as 

^<6,69) . I I Vn. ,(X i- X m >1 'Vm* (4A) H 4 l „V (4A > W (S - 2S > 

m#k 
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where p fc = tod = X~ l OB# k . 

Proof: Substitute (5.12) in (5.18) end use the definitions for end x^. 

Equation (5.23) is valid without making the additional assumption that A(9) is 
linear in 6 and factorizable as A(6) = D*(0)D. If these two assumptions are now made, 
the following result can be obtained. 

Result 5.6 The Frechet derivative 3^, (0)/30 is specified by 


at> k (0)/ae= z 


I lK k Dp m - Dp k M» : 


m 


m*k 

Proof : This result follows by substitution of 6A(0) * D^fdOlD in (5.23). 

Closely related to <$4> k is the differential 

6(4>k4>k*) = * (6 ^ k ^ k * 


(5.24) 


(5.25) 


of the outer product The corresponding Frechet derivative 3(<p k 4> k *)/30 is 

evaluated in the following result. 

Result 5.7 The Frechet derivative [3<4^ 4^, *)/30] is specified by 


aMyt^l/W - I IX,. [X k Dp m - . X m Dx m - Dp,J I* m v * Vm*>- 

m*k (5 26) 


Proof: Use (5.24) to evaluate the right side of (5.25) and recall that * 

ld(<t> k <i> k *)/d0]60. 


Discussion 


The results obtained above provide the key tools required to evaluate the 

function-space gradient 3J/30 and Hessian 3 a J/30 a of the likelihood functional. The 
most useful formulas are (5.17) for the derivative 3X k /d0 of the eigenvalue \ k> (5.24) 

for the derivative 3q> k /39 of the eigenvector dy and (5.26) for the derivative 

3(4 ) k<t>k*V30 of the outer product ($ 6 *). These formulas will be used repeatedly lr. 

the following section. 


349 



6. SPBCTRAL REPRESENTATIONS FOR THE GRADIENT. APP POTTMATR husstam. 
AND NEWTON- RAPHSON SEARCH 

Implementation of the modified Newton- Raphson search for the optimal parameter 
estimates requires calculation of the gradient 3J/30 and of an approximation to the 

Hess 'l operator 3 a J/30 a . These calculations are best achieved nalng the 
predicted-data- covariance soua re-root filter in Result 4.5 that expresses the likelihood 
functional as 

J(6 } y) = Tr log [UK(0)] 4 V. z*(0)z(0) - z*(0>y, (6.1) 

where z(6) = L(6>y 4 [I-L(8>] m(0). Function space differentiation of (6.1) with respect 
to 0 leads to the gradient functional 

g(0;y) = 3I(0;y)/a0 = Tr (OL/30) (UK)) 4 (t-y)*<dz/30), (6.2) 

and to the approximate Hessian operator 

M(0;y) = Tr IOL/30) (UR) (3L/30)) 4 (3z/30)*(dz/d0), (6.3) 

upon which the Newton- Raphson numerical search is to be based. An updated estimate 
0 n+ 1 = 0 n - 60 n is obtained by specification of the parameter change 66° defined as 

60 n = M~ 1 (0 n ;y)g(0 n ;y). (6.4) 

The main objective of this section is to replace the operator s , arions (6.2) and (6.3) 
with a set of equivalent matrix equations more convenient calculations. The 
fundamental approach to be used consists of representing the ^unction space derivatives 
3L/30, 3m/38 and dz/80 - which have only been derived in terms of operator symbols in 
(6.2) and (6.3) - in terms of e specific orthonormal basis defined by the eigenvectors 

of the observed- state covariance operator R. 

Spectral Representation for the Gradient 

Result 6.1 The Frechet derivative 3L/30 of the predicted-data- covariance 
square-root filter L can be represented as 

3L/30 = Z Z ‘kmVm*’ (6 * 5) 

k m 

where the spectral coefficients a ^^ ■ $j,*(3L/30)4> m are specified by 


4 kk = ,ln ‘*k Dp k* °V 


( 6 . 6 ) 


A km * ^m^rn "^ 1 Ico,a k * co,a m 1 ^k^m* ' X m r "‘m^k 1 (67) 
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Note that a^ m defines a matrix whose diagonal elements are provided by (6.6) and 
whose corresponding nondiagonal elements are given by (6.7). 

Proof : Observe L = £ (l-coscOq^* implies 

3L/ae * £ {sina i (da.JdO'kpfl* - cota { Of^.^/aej}. (6.8) 

Substitution of this equation in a ^ = <J> k *(3L/36)d> m and use of orthonormality of d> k 
lead to 

»y, = sina^ cos^ql^ (3X^/3©), u. ^ m ■ -eota ^*0^ m /d8) - cosa k (3# k */36)<|> m , (6.9) 

where dX^/39 and 3# k /36 are the function-space derivatives evaluated in (5.18) and 

(5.25). Substitution of these two equations from 'Sec. 5 in (6.9) leads to (6.6) and (6.7) 
thereby establishing the result. 

Result 6.2 The Prechet derivative dm/36 of the suspected mean m(6) is represented by 

dm/36 ■ £ (3m/30) k 4> k , (6.10) 

with the spectral coefficients (3m/30) k specified by 

(dm/36). * \ k (Dp k * D<t>Cf), (6.11) 

and <t>Cf in (6.11) denoting the suspected value of the state u. 

Proof : Since m » HOCf, then 6m » H6d>Cf > -H<hA(/9) <bCf, where the last equality 

follows from the condition 5<h * -<t>A(60)d>. Define now (6m) k as the spectral 
coefficient of 6m, i.e., 

(6m) k * 4> k * 6m - -4> k * HOA(60)<t>Cf - -Xjp^AideWCf, (6.12) 

where as before p k = \ k l O*H*$ k . Use of the identity p k *A(66)<9Cf «. -Dp k *D(<bCf)66 
in (6.12) results in (6m) k ■ (3m/36) k 6fc. with (3m/36) k given by (6.11). 

R esult 6.8 In the special case in which the deterministic input f is assumed to be a 

vector f = (f . _] of M inputs applied at the discrete locations £., an alternative to 

x In i 

(6.11) in evaluating (3m/36) k is 

M 

(3m/36) k - £ \ Dp k (x) *Dq>(x/C m )fm for k “ 1 N * <6.13) 

m-1 
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where 4>(x/£) is the Green’s function of the system model operator A. 

Result 6.d The gradient 3z/30 = OL/30)y+ (I-L) (dm/3©) of the filtered state esti: oate z 
can be represented as 

3z/3© = E (3Z/36) k 4> k , (6.14) 

where the spectral coefficients (3z/30> k * $ k * (3z/36) are given by 

(3z/30) k = E * km (x) y m + £ b km (x)f m’ (615) 

m=l ouT 

with a t specified in (6.6) and (6.7) and 

b. (x) * sin a. Dp. (x) *D4>(x/£ ). (6.16) 

km k k m 

Proof: Substitute 3L/36 and 3m/3© irom (6.5) and (6.10) into 3z/3€* * (3L/30)y + 

(I-L) (dm/30) and then compute the spectral coefficients (3z/36) k in 'C. 14) from 

(3z/36) k = <t> k *(3z/36). 

Result 6.5 The gradient g(0;y) in (6.2) can be represented as 

g(0;y) = E Isin a a k tana k (Dpj^'DXj^) - e k (3z/3©) k J, (6.17) 

where = 4> k *e are the spectral coefficients of the residual process e » y-z, and 
(3Z/36) k are given by (6.15). 

Proof: Substitute 3L/36 in (6.5), dz/36 in (6.14), e » E e <b and I + K * 

£ seca k <|> k 4> k * into (6.2) and use orthonormality of <)> k . 

Equation (6.17) provides the means to evaluate the likelihood functional gradient, one of 
the key ingredients of the Newton- Raphson iteration. The approximate Hessian 
operator M(6;y), which . the other major element required to implement the search, is 
evaluated below. 

Evaluation and Inversio n of the Approximate Hessian 

Result 6.6 The approximate Hessian M(9ty) in (6.3) is an integral operator whose kerne) 
M(x/£) is specified by 

M(x/f) » E (sec*^ a^Ix) a^f) ♦ z' k (x)z' k (f)l, (6.18) 
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where z' k = (dz/30)^ = <p^*(dz/dQ) is the k 1 * 1 spectra) coefficient "f ?z/d9. 

Proof: Substitute (2.26) and (6.14) into (6.3) and use the orthonorrraiity of 

Implementation cf an iteration step in the Newton- Haphson search requires calculation 

of 60 n = M 1 (0 n iy)g(0 D ;y), representing the incremental change in the parameter 

estimate. Inversion of M(O n ;y) is therefore required at every step of the .earch. This 
inversion is achieved by solving an integral equation as outlined in the following result. 

Result 6.7 The incremental parameter change 6 0 n can be computed as the solution 
of the following integral equation 

S Q M n (x/O £3 n (£)d£ = g n (x), (6.19) 

where is the approximate Hessian ‘ emel in (6.18), and g^(x) is the value of the 
gradient at the spatial location x. The subscript n in M q and g Q denotes that the 

corresponding quantities are evaluated at the parameter estimate 0=Q n . 

Proof: Observe that 60 n = M.~ l g implies M n 60 n = g Q , and express this last equation 

in term 1 ! of the kernel M to obtain (6.19). 

n 

7. PARAMETER ESTIMATION ERROR, CSAMES-RAO BOUNDS AND OPTIMAL 
INPUT DESIGN 

The objectives here are to obtain a C-R bound for the covariance of t ! '.e parameter 
estimation error and to begin an investigation of the problem of optimal input design by 
using the C-R bound as a criterion for optimal input selection. 

Recall that the covariance of an unbiased estimate 0 satisfies the inequalitv 

E(0 0 ♦) i M _1 (0 ), (7.1) 

P P ° 

where M(0 q ) L. the information operator uc lined as 

M(0 ) = Eia a j/a 0 a u 0 - Bioj/ae) o;/ae)*] 0 a a.i) 

o 0*0 *7*0 • 

O 0 

The corresponding mean-square estimation erro*- E(0 p *0 p ) sitis**^-? related 

inequality 


E(r p * e p ) i Tr(M x (© 0 » 


(7.3) 
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It can be observed that the key calculation required to obtain the C-R bound is the 
computation of E(3 2 J/80 2 ] as outlined below. 

Cramt^-Rao Bound for the Estimation Error 


Result 7.1 The information operator M(0 q ) is specified by 

M<e > = Eid^/ae 2 ]^ a = 2 TriOL/a©) a+R) OL*/ae>i 
o 0=© 

(7.4) 

+ (dm*/30) a-L*> a-L) Om/ae), 

where K = Hd>EE*d>*H % ' is the data-covariance operator, OL/30) is the derivative of 
L = I - (I+R) , and (dm/dO) is the derivative of the data mean m=H<DCf. 

Proof: Differentiate g(6;y) in (6.2) to obtain 

a 2 i/ae 2 = Trio 2 L/oe 2 vi+K) + caL/aenaK/ae)] + (z-y)*Oz/ae) 4 Oz/aewaz/ae). (7.5) 
Take the expected ’ xe in (7.5) abote, evaluate at 0 = 0 Q , and simplify to obtain 

E[a 2 J/ae 2 ' a = Tr i(3L/a0) (I+R) OL/30)’ + E[Oz*/a0) Oz/30)J. (7.6) 

y=y o 

Finally, use ' >) in (7.6) to arrive at (7.4). 

Result 7.2 In spectral form, the information operator M(0 q ) is specified by 

M(0 ) * £ (2 sec 2 a^ a^Ix) a^ff) * cos 2 Oj,m'j c (x)m , ^(f)J, (7.7) 

where a^ and m'^ = (Sm/dQ)^ are defined in (6.9) and (6.11) respectively. 

Proof : Use an approach similar to that used to arrive at (6.18). 

Inspection of (7.4) reveals chat the informatior operator M(0 o ) consists of the sum of 

two terms bcth of which are positive definite. In the first term, the data-covariance 
operator (I+R) appears as a "weighting” that is multiplied by the sensitivity filter 
dL/d9. Note parenthetically that in fart L is self-adjoint so that L » L*. The second 
term, on the other hand, will be shown to be a quadratic function of the input f. 

n esult 7.S Assume that f * ,...,L^] is a vector cf M inputs applied at the M discrete 

locations £ . T!ie information operator M(0 Q ) is an integral operator whose kernel 
M(x/£) can be e -.pressed as 


3f\ 



(7.8) 


M(x/f ) = U(x/f) + f T V(x/r'f, 

where 


U(x/£)=£ sia 4 a k Un 2 a k [Pp k (x)*rp k (x)J[Pp k (C)-Dx k (f)]. (7.9) 

k 

V(x/£)* E sin 2 ^ b k (x) b k (?), (7.10) 

k 

T 

and where t> k (£) is the M- dimensional vector 

b k (£) = |Dp k (£)-D<KC/f j) Dp k (f)*D4>(C/C^J, (7.11) 


with 4> being the Green’s function of A in (1.1). 

Proof: Substitute the eigensystem expansions for R in (2.26). for L in (4.17), for 

OL/30) in (6.5), and for dm/d6 in (6.10) into (7.4) to obtain (7.9) and (7.10). 

The second term in (7.8) is a quadratic form in the input signal f. This property can be 
used as a basis for optimal input design. 

Optimal Input Design 

The information operator can be used to state criteria for optimal input design. 
While several possible criteria exist, the one that is easiest to use is perhaps the 
maximization of Tr M(9 ): 


max Tr M(0 ) » U + f T Vf, f T f =* I , (7.12) 

o 

where 

U = J Q U'x/x)dx and V - / fi V(x/x)dx. (7.13) 

The optimal input f Q . which is the solution to the above optimization piob’em, is the 
eigenvector corresponding to the largest eigenvalue of the M-by-M matrix V. 


Other criteria for optimal input selection include: minimization of Tr (M l ), which 
would correspond to minimizing the Cramer- Rao bound; and minimization of 

X (M -1 ), where X is the maximum eigenvalue of M~ l . While these last two 

may 

criteria could be superior to (7.12), they both have the disadvantage of requiring 

inversion of the operator M( ). However, the requirement for such an inversion may 

o 

not be a serious additional drawback because a similar calculation is required to 
implement the Newton- Raphson search outlined in the previous sections. 
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Vanishing Bias of the Gradient 


Closely related to the above analysis is an investigation of the bias in the 
parameter estimate 9. The central result is as follows. 

Result 7.4 The expected value of the gradient functional g(9;y) vanishes at 9 * 0 q , i.e., 

Elg<9;y)J | 0=0 =0. (7.14) 

o 

Proof : Observ' that dz/30 = OL/89)y + (I-L) (dm/30), and recall that y = (I+K)e. 

Substitute this in (6.2) and take the ejected value. Finally, use the whiteness of the 
residual process, to be established in (8.46). 

8. FILTERING, SMOOTHING AND THB RESIDUAL PROCESS 

The central aim of this section is to con net an analysis of the smoothed estimate 
u q and of the filtered state estimate z q that emerges from the 

predicted-data-covariance square-root filter. This analysis leads to the following 
major results: 

• The smoothed estimate u q is optimal in a conditional mean sense. 

• The formulas that generate u q and have a predictor-corrector structure in 

which the final state estimate is the sum of: a prediction term-based on 
application of knewn inputs to the system model; and a correction term based 
on the difference between the actual and predicted data. The key element in 
these formulas is an estimator gain that provides the relative weighting 
between the two terms. 

• The covariance of the state estimation error inherent in both estimates can 
be evaluated by means of equations which, if written in operator notation, 
resemble those encountered in filtering and smoothing for linear dynamical 
systems. 

• Investigation of a residual process associated with the filtered state estimate 
z that has properties nearly identical to those of an innovations process: the 

C 

residuals are a white noise process with a unit covariance; the residuals and 
the measurements can be obtained from each other by means of reciprocal 
linear transformations. Because these transformation; *re not causal, the 
residuals are not a bona fide innovations process. However, they are as 
useful in deriving filtering, smoothing and identification solutions for elliptic 
syst .ms as the innovations process is in deriving similar solutions for linear 
dynamical systems. 
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• Development of relationships between the filtering and smoothing estimates 
that can be thought of as extensions to elliptic systems of the 
forward/backward sweep method for solution of filtering and smoothing 
problems in linear dynamical systems. 

• Development oi spectral representations for the predicted-data- covariance 
square-root filter and the optimal smoother in terms of the eigensystem of 
the state covariance R » <t>BB*<b*. This leads to simple ways to implement 
filtering and smoothing solutions on a computer. 

Smoothed and Filtered Estimates 


The smoothed and filtered state estimates u and * have been defined in (1.17) as 

o o 

u = <t>Cf 4 G(y-H<J»Cf), z « <*>Cf 4 g(y-H$Cf), (8.1) 

o o 

where G and g are Kalman- like gains specified by 

G = Z sin a a^x^4>j,*, g * £ (l-cosa^,) x^4>^. (8.2) 

The estimate u is referred to as a smoothed estimate because it is the 
o 

minimum- variance estimate of the state given the entire data set. This is established 
by the following result. 

Result 8.1 The smoothed estimate u in (8.1) is the conditional mean u _ « B(u/y) of the 

Q O 

state given the data. Furthermore, the estimator gain G in (8.2) can be expressed 
alternatively as 

G = RH*(I 4 HRH*)“\ (8.3) 

in terms of the state covariance R * 

Proof : Recall the general formuk. 

B(u/v) * E(uv*) [E(w*)] -1 v (8.4) 

derived in [4] for the conditional expected value of a zero-mean random process u given 
the related zero-mean random process v. Note that this formula requires calculation of 
the “cross-covariance" operator B(uv*) and the auto-coV4riance operator B(vv*). 
Define now the mean- centered state u * u~d>Cf « d>Bo and the mean-centered data y » 
Hu 4 n. By this definition, u and y are zero-mean. Therefore (8.4) can be used directly 
to compute u o = B(u/y), i.e., 


u o B(u/y) • E(uy*) lE(yy*)J % 


(8.5) 
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which indicates that to evaluate U qI it is necessary to first evaluate the covariance 

operators E(uy*) and B(yy*). These calculations are: B(uy*) » E(OB6)<j*B*<h*) = 

4>BB*<t>* and B(yy*) = E [(Hu+n) (Hu+n)*] * I * HfiH* Use of this in (8.5) leads to 

E(u/y) = Gy. (8.6) 

This together with the definition of V and J in terms of u and y implies (8.1). The 
equivalence between the two different expressions for G in (8.2) and (8.8) is established 
by use of the spectral expansions in Sec. 2. hi particular, use expansions (2.46) - (2.47) 
for I + R and the definition for Xj, in (2.31). 

As established by this result, the estimate u q has a very well defined probabilistic 

interpretation. It is not presently known if the utered estimate has a similar 

interpretation. Nonetheless, this estimate plays a very significant role in the filtering, 
smoothing and identification methodology for elliptic systems under development here. 
Its role is analogous to that of the filtered estimate emerging from a Kalman filter in 
the case of dynamical systems. This is further investigated below. 

Predictor-Corrector Structure 


To examine this structure, cmsider the equation for u q in (8.1) and illustrated in 

Fig. 8.1. Use of the deterministic input f^ and the system model <t»C^ leads to a 

predicted estimate The difference process y-H<t>Cf ^ is then formed and operated 

on by the estimator gain G ^ to obtain the correction term G(y-H<bCf) Firally, 
the correction term is added to the predicted estimate to obtain the optimal estimate 
u q . The equation for the filtered estimate. z q in (8.1) also has a predictor-corrector 

structure. The key difference between the twe equations in (8.1) is that the estimator 
gains are different. A relationship be' ween these two different gains G and g is 
explored later in this section. 


SYSTEM MODEL 



Fig. 8.1 Predfctor- Corrector Form of the Smoothed State Estimator 
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Estimation Brror Covtmnce ami KahnaiMifce Gains: Smoothing 


Since u q and x q are only estimates of the actual state u, it is of interest to 

investigate the inherent estimation error u * u-u and z » u-z . In particular, the 

p o p o 

aim is to determine the estimation error covariance, under the assumption that the 
actual model errors o and n in (1.1) and (1.2) are white-noise processes. 

Result 8.2 The covariance P * P* * E(u u *) of the state estimation error u = 


— P P P 

u-u q is specified by the following alternative formulas: 

P = (I - GH)R(I - GH)* + GG*. (8.7) 

V = R - RH* (I + HRH*) -1 HR, (8.8) 

P = ( I - GH)R = R(I - GH)*, (8.9) 

P = $B (I + B*<t>*H J 'H<t>B)~ 1 B*<b*. (8.10) 


Proof: To show (8.7), observe that u « d>Cf + <bB» and u q - <bCf + G(y - H<bCf) 

imply that u = u - u is 
P o 

u = (I - GH)<bB» - Gn. (8.11) 

P 

Hence B(uu p *) = E['T-GH)<t>Bu(j*B*<i>* (I-GH)* + Gnn*G*] » (I-GH)R(I-OH)* + GG*. 

where use has been made of the fact that e » [u,n] is a white-noise process with 
covariance E(ee*) * I. To show (8.8), observe that (8.7) implies 

P = R - GHR - RH*G* ♦ G( 4 HRH*)G*. (8.1.) 

Substitution of G = RH* (I + HfH*) -1 in (8.12) leads to (8.8). To show (8.9). observe 

that (8.8) can be expressed as P - R (I-GH)* « (I-GH) 8 by using G - RH*(I ♦ HRH*) -1 
in the last two terms of (8.12). To establish (8.10) , substitute 5 ■ <t>BB*<t>* in (8.8) and 

use the identities B*d>*H* (J+H<J>BB*<t>*H*) _1 HC>B - (I+B*<&*H*H<&B) -1 B*<!>*H*H<I>B - 1 - 

(1+ B*<t>*H*H<J>B) — 1 . 

Result 8.3 The operator HPH* is the Fredholm resolvent of HRH* so that 

(I + HRH*) -1 - I-HPH*. (8.13) 

Proof: Compute HPH* from P in (7.8) to obtain HPH* - HSH* [I - (1+HRH*) -1 ]. Use 

the identity (I4HRH*) -1 HRH* » I - (I+HRH*) -1 twice in this Ust equation to obtain 
(8.13). 

The aim now is to use (8.13) in (8.2) to obtain an alternative expression for the 
estimator gain. 
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Result 8.4 


The estimator gain G = RH*(I+HRH*) 1 can also be expressed as 
G = PH*, (8.14) 


where P = E(u u *) is the covariance of the smoothed state estimation error u . 

P P p 

Proof: RecaU that PH* = (I - GH)RH* = RH*II - (I + HRH*) -1 HRH*]. iince 

(I + HRH*) -1 HRH = I - (I + HRH*) -1 , then PH* = RH*(I + HRH*) -1 = G. 

Result 8.5 The mean-square smoothed state estimation error is given by 

E(u *u ) = Tr (P). (8.15) 

P P 

Proof: This follows from the definition of P as F = E(u u *). 

P P 

Note that many of the above formulas are very similar in form to the ones 
traditionally encountered in Kalman filtering for dynamical systems. For instance, Bqs. 
(8.3) and (8.14) are very similar to those used to compute the gain G for a Kalman filter 
in which R and P are the covariances of the estimation error associated with the 
predicted and _corrected state estimates. Note also that (8.8) implies that P is always 
smaller than R, which implies that the covariance of the estimation error after the 
observation y has been accounted for is smaller than the error covariance before the 
estimate correction occurs. 

Estimation Error Covariance and Kalman-like Gains: Filtering 

The aim here is to obtain results similar to results (8.2) - (8.5) above, but that are 
applicable to the filtered estimate z q . 

Result 8.6 The covariance E(z z *) of the filtered state estimation error z = u-z 
p p P o 

is given by 

E(z p z p *> = (I-gH) R(I-gH)* + gg*. (8.16) 

where R = <t>BB*4>* is the state covariance, and g is the filter gain in (8.2). 

Proof : Note that u = <t>Cf + <J>Bu. This and (8.1) imply that 

z p = (I-gH) <t>Bo-gn, (8.17) 

where use has been made of v-Hd>Cf = H<t>Bu 4 n in (6.1). Calcul'tion of E(z z *), 

P P 

.... 17) and the conditions E(ou*) = I and E(nn*) * I, leads to (8.16). 

This result applicable to the filtered estimate is analogous to (8.7) of the smoothed 
estimates. To obtain results that are analogous to (8.8) - (8.10) requires, howover, a 
preliminary definitions and results. The need for these preliminaries arises from 
-■t jltimate desire to find a spectral decomposition for the state covariance R = 
•' It is straightforward to obtain the spectral representation for the 
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observed- stale covariance HRH*. However, finding a similar decomposition of R is not 
as simple. The primary reason for this lack of simplicity is that the vectors ^ = 

X. may not necessarily span the entire space H. This is particularly true in 

cases in which the dimension of the input space is greater than the dimension of the 

observation space Hg. In order to consider this case, assume that the operator HOB has 

finite-dimensional range. This corresponds to the situation where there are only a 
finite number N of seniors and the observed- state covariance R = H0BB*0*H* is an 
N-by-N matrix. Assume also that the input space is either infinite-dimensional or 
finite-dimensional with dimension M greater than N. This second assumption 
corresponds tc cases where the uncertainty is distributed at M discrete locations or 
throughout the entire spatial domain Q. 

Result 8.7 The identity operator I mapping into itself can be decomposed as 

1 = 1 +1,, (8.18) 
o X 


where 


I =I-B*d>*H*R 1 Hd>B, I. <b*H*R l H<bB, (8.19) 

0 X 

and R = Hd>BB*<b*H*- is the observed-state covariance. In addition, I is in the 

o 

null- space of the operator 

R (•) = H<t>B( • )B*<t>»H*, (8.20) 

mapping the space of bounded linear transformations on HjX H^ into the space of 
N-by-N matrices. Furthermore, I Q and 1^ are orthogonal complements so that 

1 *1, = Tr (I IJ = 0. (8.21) 

0 X o X 

Proof : This result and its corresponding proof are illustrated graphically in Fig. 8.2. 

Eq (8.18) follows from (8.19). Substitution of I in (8.19) into (8.20) shows that R (I ) = 0 

so that I is in the null space of R( • ). That I Q and 1^ are orthogonal complements 

follows from substitution of (8.19) in (8.21) by calculation of Tr [I Q 1^] using (8.19). 

Resul 8.8 The i* • • y operator I mapping H^ into itself can be expressed as 

N 

1 * I Q ♦ £ <8.22) 

)-l 
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SPACE OF BOUNDED LINEAR TRANSFORMATIONS 
FROM INPUT SPACE INTO ITSELF 



Fig. 8.2 Orthogonal Complement Decomposition of the Identity in Hj z H^. 

Proof : Substitute R = into I^in (8.19) and use tpj « X~ x B*<b*H*<Pj. 

The above result simply reflects the fact that the ^ do not span H^, because (by 

assumption) there are only a finite number of them, and this number is smaller than the 
dimension of the input space. 

Result 8.9 The state covariance R « $BB*$* can be decomposed as 
N 

R * R q 4 £ X* x j* *• (8.23) 

M 

whirl* 

R « <t>Bl B*<t * (8.24) 

o o 

Furthermore, 

HR H* = 0, HR » 0, R H* * 0. (8.25) 

O 0 0 

Proof : To show (8.23), substitute I from (8.22) into 5(1) - <&B(DB*<t>*, and use x^ » 

X _1 <bB4r , . To show (8.25), substitute I from (8.19) into (8.24) and (8.25). 

) j o 
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Result 8.10 


The dual state covariance Q = d>*H*H<J> can be expressed as 


N 

Q = Z X? PjPj* (8.26) 

)=1 

where p. = - 

J ) J 

N 

Proof : Since the <f>. span the observation space H„ = R , th-a 

) 

N 

I N = Z 4)^* (8.27) 

)=1 

N N N 

v;here I denotes the identity in R x R . To obtain (8.26), substitute (8.27) in Q = 
d>*H*I N H<J> and use definition of p.. 

Define now the quantities 

N N 

r = (*/«) R Q + E (seca.-l) x.x.*, <1 = Z (seca.-l) p.p.* (8.28) 

}=1 )=1 

and note the following key identities. 

Result 8.11 The state covariance R = <DBB*<b* and r defined in (8.28) are related by 
R = r + r*+rH*Hr. (8.29) 

Furthermore, 

I + HRH* = (J+HrH*) (I+Hr*H*). (8.30) 

Proof : Substitute r from (8.28) and R from (8.23) into (8.29). Use the orthonormaliiy 

of x. with respect to H*H. This establishes (8.29). Equation (8.30) follows from (8.29) 

by forming I+HRH* from (8.29) and rearranging terms. 

Result 8.12 The dual state covariance Q in (8.26) and q 'n ,o.28) satisfy the identity 
Q = q+q* + qBB*q*. (8.3 D 

Furthermore, 

I + B*QB = (l+B*qB) (I+B*qB). (8.32) 
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Proof : Substitute Q in £3.26) end q in (8.28) into (8.31). Use the orthonormality of p^ 

with respect to BB*. This establishes (8.31). To establish (8.32), form I + B*QB using 
(8.31. and rearrange terms in the resulting equation. 

These are the preliminary results needed to evaluate the covariance of the estimation 

error associated with the filtered state estimate z . 

o 

Result 8.13 Th: filter gain g defined in (8.2) can be expresssed alternatively as 


g = rH* (I*HrH*) \ 
where r is defined in (8.28). 


(8.33) 


Proof: Substitute r from (8.28) into (8.33) and use R H* 

O 

recovers g in (8.2). 


0 and <J>.* 


x.*H* This 


Note the similarity between (8.3) and (8.33). The equation in (8.3) expresses the 
smoother gain G in terms of the state covariance R *_<PBB*<b. Eq. (8.33) is a similar 
equation for the filter gain in terms of r. The operator R in G can be interpreted as the 
state covariance. No similar piobalistic interpretation for r is known. However, its 
introduction is very useful because it allows development of formulas fo: the estimation 
error covariance and for the filter gain that very closely resemble those obtained for 
the smoothing solutions. 

Result 8.14 The covariance E(z z *) of the filtered state estimation error z * u-z 


E(z z *) * p + p*, (8.34) 

P P 

where p = p* is specified by the alternative formulas 

p = (l-gH)r(I-gH)* + gg* (8.35) 

p = r-rH* (I+HrH*) -1 ^, (8.36) 

p = (I-gH)r = r (I-gH)*, (8.3?) 

p = ( l /j) R q + £ (l-cos<ij) (8.38) 

Proof : To establish (8.34) and (8.35), substitute (8.29) in (8.16) and use the identity 

(I-gH) rH* * rH* d + HrH*)“ x * g. (8.39) 

To establish (8.36 ), observe that (8.35) implies that 

p - r-gHr - rH*g* + g(I + HrH*)g*. *3.40) 
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Substitute g * rH* (I + HrH*) -1 in (8.40) to obtain (8.36). To obtain (8.37) observe that 
the second term of (8.36) can be expressed alternatively as gHr and rH's*. To obtain 
(8.38 ), substitute r in (8.28) into (8.36) and use orthonormality of 

Result 8. IS The mean-square estimation errcu associated with the filtered state 
estimate is given by 


N 

B(Zp*z p ) = Tr |p + p*] * TrIR Q ) + 2 £ (l-cosa^) (8.41) 

)*1 

Proof : This result follows from (8.34) and (8.38). 

Result 8.16 The filter gain g can be expressed as 

g * pH*. (8.42) 

where p is related to the filtered state estimation error covariance by B(z z *) » p + p*. 

P P 

Proof : Since g = rH* (I+HrH*)~\ then g « rH* (I- Kg)* - rH* (I-g*H*) = r 

(I H*g‘)K* = r (I-gH)*h* * pH*. 


This equation is analogous tc (8.14) in that it expresses an estimator gain in terms of 
the covariance of the state estimation error. 

Result 8.17 The operators I 4 HrH* and I -HpH* are reciprocal i. e., 

(I + HrH*)~ l = I - HpH*. (8.43) 

Proof : Recall (I 4 HrH*) -1 » I-Hg « I-HpH*, where the last equality holds because 

g * pH*. 


Note that this result implies that the operator HpH* is the Fredholm resolvent of 
HrH*. The identity also immediately implies whiteness of the residuals process as 
investigated in more detail below. 

Pseudo- Innovations Properties of the Residuals 

Define the residual process in the usual way. as the di/fcret between the actual 
measurements and the predicted data emerging from the pred-cted-data-covariance 
square-root filter. i.e.« 

e * y-Hz (8.44) 

o 

This process turns out to have two key properties that are tearly identical to those of 
an innovations process: the residuals are white with a unit covariance; the residuals 
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and the measurements can be obtained from each other by means of reciprocal 
relationships. These two properties are established in the following results. 


Result 8.18 The residual process defined in (8.44) is white with a unit covariance, 

i.e., 

E(ee*) = I. (8.45) 

Proo£: Observe from (8.1) that e * (I-Hg) (y-H$cf). Hence, B(ee*) * (I-Hg) (I. HRH*) 

(I-Hg)* = I. This last equality follows from E[(y-Hd>cft (y-H<bCf)*J * I + HRH* and from 
(8.42, and (8.43). 

Result 8.19 The residuals e « y-Hz Q and the mean-centered measurement process 

y = y-HOCf can be obtained from each other by means of reciprocal linear 
transformations, i.e., 

e = (I - HpH*) y, y « (I.HrH*)e (8.46) 


where 


(I + HrH*) 1 * (I- HpH*). (8.47) 

Proof : Eq. (8.47) has been established in (8.43) and is restated here only to 

emphasize its relationship to the properties of the residual process. Eq. (8.1) implies 
e = (I-Hg)y. 

Relationships Betw eea Filtered and Stp n.-»h>n< 

While the smoothed and filtered estimates have been defined somew’ott independently 
of each other, these estimates are in fact very closely related. It is possible to ' tain 
one in terms of the other, as outlined in the following result. 

Remit 8.20 The smoothed and filtereo estimates and v are related by 
0 0 

u * z + ge, (8.48) 

o o c 

where 

e - y-Hz q (8.49) 

is the residual process, and g is the predicted-data- ovariance square --out filter gain. 

Proof : Observe that (8.1) and (8,3) Imply u • 4>Cf ♦ RH* (I + HfiH*) 1 (y-H<DCf). 

Use of (8.46) leads to u - <t>Cf + RH* (I*HrH*)“ l e Similarly, z^ in (8.1) and - in (8.33) 
lead to z « d>Cf -v rH*e. Hence, u -z - [RH* (I+K.H*)" * -rH*]e. Use of the identity 

O 0 0 

(8.29) in this implies that u 0 * z 0 * ge, which is the desired result. Note that (8.48) can 
be *>,* + r n in the alternative form 
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(6.50) 


u = a-gH)z 4 gy, 
o o 

Closely related to the above relationship between filtered and smoothed stete estimates 
is a relationship between the corresponding covariances of the state estimation errors. 
This is developed below. 

Result 8. 21 The filtered state estimation e.-or z = u-z and the residual process 

P o 

e = y-Hz are related by 
o 


e = n 4 Hz . (8.il) 

P 


where n is the measurement error. 


Proof: Note that e = y-Hz » Hu4n-Hz * Hrn-z ) -*n * Hz + a 

“ “ “ ft ft ft Ti 


Result 8.22 The covariance ? * E(u u *) of the smoothed stste estimation error 

p p 

u = u-u can be expressed as 
P o 


P * p 4 p* - plI*Hp, (8.52) 


where p 4 p* * E(z z *) is the covariance of the filtered state estimation error z_ *■ 
P P P 

u-z . Furthermore, 


I-Hf H* . (I-HpH*) (I-HpH*), (8.53) 


Pro of: Use (8.52) to obtain 

E(ez *) = E(nz *) 4 HE(z z *) » B'nz *> 4 H(p4p*). (8.54) 

P P ?P P 


Now use (8.17) to compute E(nz^*), i.e., 


B(nz *) x -g*. (8 57' 

P 

since E(no>*) = 0 by assumption. Substitution of (8.55) in (8.54) and use of j * j il* !eads 
to 


t(ez p *) * g*. (8.56) 

Since u * u-u , the” u * z - ge from (8.48). Hence, 
p o P P 

E(u u *) * £(z z *) - g B(ez *) - E(z 4 gL(ee*)g*. (8.57) 

P P P P P P 

Now use ,8.34), (8.43), (8 45) and (8.f6) to obtain (8.52). Equation (8.53) follows 
immediate, 'v from (8.52) by forming I - HPH* are; rearranging terms iu tl-i resulting 
expression Note that (8.52) implies that the gains C . id g aie related by 
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G • g ♦ g - gHg. 


(e .58) 


The last three results can be viewed as a generalization to elliptic systems of 
relationships encountered in filtering and smoothing for dynamical systems. For 
example* Equation (8.48) is a generalization to elliptic systems of the forward/backward 
sweep method for solution of two-point boundary-value problems. This method in 
general terms states that the smoothed states estimates can be obtained as a result of a 
two-stage process: forward filtering by means of a Kalman filter to obtain a filtered 
state estimate and a residual process; and backward smoothing to process the residuals 
and obtain a smoothed state estimate. This two-stage data processing approach has 
been extensively studied for linear dynamical systems. Bqs. (8.48) and (8.49) have 
exactly the same structure. This structure is illustrated in Fig. 8.3. 

The ovo. all diagram illustrates how the data y^ and the deterministic input f^ are 

(31 

processed to arrive at a smoothed estimate u l \ The estimation process consists of two 

o 

stages: a FILTERIN''' stage that results in a filtered estimate z^ and a residual 

o 

process^. This filtering stage is characterized by a predictor- corrector structure 

where a predicted estimate^ is first produced and then corrected by a correction 

term.^ The results of the filtering stage are then processed by the SMOOTHING 

[81 

stage. Central to both the filtering and smoothing stages is the gain g . The 
foregoing structure is nearly identical to that of the forward/backward sweep method in 
linear dynamical systems. There are* however, some key differences. One of the 
differences is that the filtering stage in the case of dynamical systems is based on the 
Kalman filter* whereas in the elliptic case under consideration here, this filter is 
replaced by the predicted-data-covariance square-root filter. Another key difference 
is that the Kalman filter is causal whereas the predicted-data covariance square-root 
filter is not, i.e., the filter gain g is a Fredholm operator as opposed to being a Volte rra 
operator. In the same vein, the smoothing stage for dynamical systems is backward (in 
time) or anticausal. In the elliptic system case, however, the smoothing stage is also 
characterized by Fredholm operators. The notion of causality is not even introduced 
here although it is possible to do this for certain classes of elliptic systems [1]. 


FILTERING 


SMOOTHING 


r 1 ~ " ’SMOOTHED 



Fig. 8.3 Filtering and Smoothing 
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Spectral Representations: Sm oothing. Filtering. and the Residuals 


The aims here are: to obtain spectral representations for the filtered and smoothed 
estimates u q and z q and the corresponding error covariances P and p; to explore the 

predictor- corrector structure of the spectral representations of the filter and 
smoother; and to investigate the pseudo-innovations properties of the spectral 
representation of the residual process. The term “spectral representation* means the 
use of an expansion in terms of the eigensystem <J>. of R and of the related functions 

x|r. = X. x.= Xp^Bifr. and p. = X7 l d>*H*q>.. 

Result 8.23 The smoothed state estimate u q can be represented as 


u q = <PCf + E sin a aL (y. - m.) x., (8.59) 

where ,, - *,*7 «d m, • 4,*m ere the W«trd compo«nu of the deu y end the 
suspected mean m = H<t>Cf. The related observed- state estimate Hu^ is specified by 

Hu = m + HG (y-m), Hu - (I-HG) m ♦ HGy. (8.60) 

o ' o 


In spectral form, Hu q 


l 


u 1 <p. where 
o^) 


* m. + sin a o. (y.-m.), u^ 


) ') J 


cos m^ + sin ^y^, 


(8.61) 


and u* » <p.*Hu . Let u * u - u denote the estimation error. The error covariances 
0)0 p o 

F = B(u u *) and HFH* « B(u *H*Hu ) are represented by 
P P P P 


F « S o ♦ £ sin 2 a. x. x * HPH* - E m*app*. (8.62) 

Furthermore, the corresponding mean-square estimation errors B(u p *n^) « Tr[P] and 

E f u *H*Hu ) = Tr [HPH*] are 
P P 

E(u *u ) * Tr [5 ] 4 E *in a a, x.*x., B(u *H*Hu ) « E ‘inV (8.63) 

P P o'- I ) I P P I 

Proof : To establish ( 8.59) . substitute y ■ Ey^ and m « Em^ in (8.1). To show 

( 8.60) multiply u q in (8.1) by H and recall that m • H<PCf. To establish 8.61 . multiply 

(8.60) by The equation for P in ( 8.62) follows by substitution of (8.23) in (8.8) and 

use of the conditions HR * R H* * HR H* - 0. The equation for HPH* in (8.62) 

o o o 

follows frr ht P and use of * Hx } . Bq ( 8.63) follows from (8.62) and the orthonormality 
of <ty 
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Result 8.14 The filtered state estimate z^ can be represented by 

z = <bCf + £ (1-cosa.) (y.-m.) x., 
o ) I J I 

The related observed state estimate z = Hz is 

o 

z = m + Hg(y-m), z = (I-Hg) m + Hgy. 

In spectral form, z = £ z.<p. 


(8.64) 


(8.65) 


z. = m. ♦ (1-cosa.) (y.-m.), z. * cosa. m. + (1-cosa.; y.. 
) ) ) ) ) ) ) “ i ) 


( 8 . 66 ) 


Let z = z-z denote the estimation error. The estimation error covariances H(z z *) 


p + p* and E(HZpZ^*H*) = H(p 4 p*)H* can be represented as 


P P 


P = (*/*) R q 4 £ (l-cosa^) x.Xj*, HpH* * £ (1-cosa.) $.$.*. 


(8.67) 


Furthermore, the corresponding mean- square estimation errors are 


where 


E(z *z ) = Tr(p 4 p*) f B(z *H*Hz ) * Tr [H(p 4 p*)H*], 
P P P P 


( 8 . 68 ) 


Tr IpJ = (V«) Tr [RJ 4 £ (1-cosa.) x.*x., Tr [HpH*] . £ (1-cosa.). (8.69) 

Proof : To show ( 8.64) , substitute y * Ey ^ .and m * Em jj> jinto z Q in (8.1). Bq. ( 8.65) 

follows from multiplication of (8.64) by H and use of m * H<PCf. Eq. ( 8.66) is obtained 
from (8.65) upon multiplication by and use of the orthonormality of The 

equation for p in (8.67) has been established in (8.38) and is repeated here only for 
convenience. The second of Eq. ( 8.67) follows from use of the identity <bj * Hx^. Eq. 

( 8.68) follows from the definition of p * p* in (8.38). Bq. ( 8.69) is established by 
performing the trace operation on (8.67). 


Result 8.25 


The residual process e * y-Hz Q can be represented as 


e = £ e.<bj, e^ = <|>.*e. (8.70) 

The spectral components e. are independent random variables with zero-mean and unit 
covariance, i.e., 


B(e.e.) = 0 for i »tj, B(e*) * 1. 


(8.71) 
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Furthermore, the spectral components e^ and of the residual and difference processes 
e = y-Hz Q and y - y-m are related by the reciprocal relationships 


e. = cosa. y. f y. = seca. e.. (8.72) 

Proof ; Eq. ( 8.70) is valid because 4> .are orthonormal in H To show ( 8.71) , observe 

that E(e.e.) = <t>* 3(ee*) 4>. and then use (8.4S) and the orthonormality of 4>.. Equations 

(8.72) are the spectral representations of the reciprocal relationships (8.47). Note that 
(8.72) can also be established by the simple trigonometric identity (1/cosol) * seccr. 

9. NUMERICAL SEARCH CALCULATION SUMMARY 

Since the development of the estimation- approach is rather lengthy, it is 
convenient to summarize the steps that are required to implement the numerical search. 
It is assumed that the process starts with a known input f, a set of data y and an 

initial parameter estimate 0 n . To conduct an iteration in the numerical search requires 
that the following steps be performed: 

1. Compute the suspected mean and covariance m = H$Cf and R « Hd>BB*0*H*. 

2. Compute the eigenvalues and eigenvectors of R. 

3. Compute the related vectors p^ * = B*p^ and 

4. Conduct a spectral analysis of the data and of the suspected mean to obtain the 
spectral coefficients y^. = 4> k *y *nd m^ = 4>^.*m. 

5. Use Result 6.5 to evaluate the gradient dJ/36 of the likelihood functionaL 

6. Use Results 6.6 and 6.7 to compute the Hessian M q and to determine the 
incremental change 68 n of the parameter estimates. 

7. Obtain a new parameter estimate 0 n+ * * 0 n - 60 n , return to step 1 above, and 
iterate through steps 1 to 6 until convergence is achieved. 

If Cramer- R so bounds and/or an optimal input are desired use (7.6) - (7.13). If the 
covariance of the state estimation error is desired use Result 8.2 and/or 8.13. 

The calculations involved in conducting a single iteration in the maximum-likelihood 
parameter estimation approach are summarized in block diagram form in Fig. 9.1 A 
single iteration consists of all of the computational steps required to obtain an updated 

parameter estimate 0 n+1 by processing the available data, the known deterministic 

input, and the current parameter estimate © n . 


371 



wuAH-ioot mrut 



Fig. 9.1 Calculations Required for Single Iteration in Modified Newton- Saphson Search 

To simplify the description of these computations, the steps performed in a single 
iteration have been grouped into the following four major blocks (delineated by the 
broken lines in the diagram): 

• a SQUARE-ROOT FILTER block that processes the measurement data y and 
the external input f to obtain a filtered estimate z and a corresponding 
residual process e, defined as the difference between the data and the 
filtered state estimate. The square-root filter implements the equations z » 
Ly 4 (I-L)m and e « y-z. The central computation in the square-root filter 

-Vi 

block is that provided by the operator L » I - (UR) defined in terms of the 
square-root of the predicted-data- covariance (I ♦ R). This operator appears 
in two distinct places in the diagram: in the data filter , whose primary 
function is to process the measurements y; and in the mean filter, whose 
main function is to process the suspected mean m. The suspected mean is in 
turn obtained from the known external input by means of the input-output 
model. 
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• * SQUARE- ROOT FILTER SENSITIVITY block that processes the 
measurement data y and the deterministic input f to obtain the filtered 
estimate sensitivity dz/36. This block implements the equation dz/86 = 
OL/36)(y-m) 4 (I-LWdm/39). The computation of the sensitivity dL/36 is the 
main calculation performed in this block. 

• a GRADIENT-HESSIAN SYNTHESIS block that forms: the function-space 
gradient dj/d6 of the likelihood functional by means of the equation 3J/36 = 
TrtOL/dOXUK) - (3z/30)e*]s and the function-space approximate Hessian by 
means of the equation M - Tr [ OL/aOXU RXdL */80)] 4 Oz/30)*Oz/ae). Note 
that the quantity that is actually evaluated in this block is the kernel M (x/£) 
of the Hessian operator. This kernel is a function of two spatial variables x 
and C defined over a "square* domain (x/£)cfi x Q, where Q is as before the 
spatial domain of definition of the system model. 

• a NEWTON- RAPHSON ITERATION block whose input is the gradient and the 
approximate Hessian and that generates as an output the updated parameter 

distribution 8 n+ 1 for the next iteration. The central calculation in this block 

is the solution of the integral equation M Q 66 n » g Q that results in the 

parameter estimate update 66 n . 

After specification of the parameter estimate 0 n+ ^, the square-root filter L(6°) and its 

sensitivity dUQ^/dO are redesigned by letting 0 n -*0 n+l , and the steps outlined above 
are repeated in order to conduct the next step in the iterative process for optimization. 

The predicted-data- covariance square-root filter processes the data y and the 

suspected mean m to produce a filtered state estimate z and a set of residuals e ■ y-z. 

_i/ t 

This is done by means of the equation z > Ly 4 (I-Lftn, where L * i-(UR) . This 
equation, while providing a very succinct symbolic description of the square-root filter, 
does not by itself provide a recipe to conduct computations. In order to provide such a 
recipe, it is convenient to use the corresponding spectral form z^ * (l-coso^)y^ 4 

cosa^m^, which expresses the spectral amplitudes z^ « $^,*z of the filtered state 

estimate z as a linear combination of the data and suspected mean spectral amplitudes 
y^ and m^. Such a spectral form of the predicted-data- covariance square-root filter is 

illustrated in Fig. 9.2. 

The diagram in the figure illustrates the main calculations involved in the square-root 
filter. On the upper branch of the diagram, a set of data^ y *. Iy^,...»yj t j] is assumed to 

[2] 

be available at N discrete locations. A spectral analysis is conducted on this data to 
obtain the data spectral amplitudes^ ly l ,...,y N ]. These spectral amplitudes are 

then multiplied by the coefficients (l-cosa^) in the data filter^, resulting in the terms 

(l-cosa^)y*. On the lower branch of the diagram, the deterministic inputs f| 6 ^ are 
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processed by the input/output system model^ to obtain the suspected mean 
m * (m 1 ,...,m N J*- 8 l The spectral amplitudes m * m PI of the suspected mean are 

then computed and subsequently multiplied by the coefficients cosa^ in the mean 

filter^ to produce the terms cosa^m* ^ . This last term is then added to 

(l-cosa^ly* in^ resulting in the filtered state spectral amplitudes and the 

1131 

residuals . Note that the physical state estimate z and the residual e can be 
recovered from z^ and e^ by means of the summations z « Ez^4>^ and e = 
although for simplicity this last transformation is not shown on the diagram. 



Fig. 9.2 Spectral Form of Predicted-Data-Covariance Square-Root Filter 

The foregoing remarks have scrutinized the spectral form of the square-root filter 
equation z = Ly 4 (I- Dm. The immediate aim now is to conduct a similar detailed 
analysis of the spectral representation of the square-root filter sensitivity equation 
dz/dO « OL/36)y 4 il-D (dm/38). The spectral form of this equation is stated in Bq. 
(6.15) and illustrated in the block diagram in Pig. 9.3. The overall primary 
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function of the square-root filter sensitivity is to process the N mean-centered data 

spectral amplitudes^ and the M deterministic inputs^ in order to obtain the spectral 
(31 

amplitudes of of the filtered state estimate sensitivity dz/86. An intermediate 
calculation embedded within this overall process involves processing of the 

mean-centered data spectral amplitudes y*^ by means of the N-by-N matrix, with 

general elements a^, representing the data filter sensitivity 8L/30^. Other 

intermediate steps involve: processing of the deterministic inputs f ^ by the 

xn 

input/output model sensitivity matrix b km 151 to generate the suspected mean spectral 
amplitudes (dm/36)^ and subsequent processing of these coefficients by the mean 
filter ^ to obtain the terms cosa^ Om/30)^ 



Fig. 9.3 Spectral Form of Square- Root Filter Sensitivity 

10 . CONCLUDING RBMARKS AND FUTURB DIRECTIONS 

The a of estimation for elliptic systems is so full of interesting research 
problems that, in spite of all that this paper has covered, much more remains to be 
done. These are some of the problems that lie ahead: 

• Conduct of an asymptotic statistical property analysis that explores the 
convergence of the parameter estimates as the number of observations 
increases. 
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• Development of approximation approaches that rigorously arrive at 

finite-dimensional approximations to the infinite-dimensional solutions 
advanced here. 

• More complete investigation of the optimal input design problem. In 

particulari development of "spectral" domain design approaches which would 
do for elliptic systems what the frequency domain methods achieve for linear 
time - invariant dynamical systems. 

• Development of more precise mathematical arguments to Justify 

function- space differentiation, eigensystem expansions, covariance 

calculations, likelihood-ratio derivations, etc. 

• Investigation of alternative (to the square-root) factorization of the 

predicted-data- covariance that could result in easier calculation of the 
function- space derivatives necessary for the Newton- Raphson search. 

• Numerical experimentation with the filtering, smoothing and identification 
algorithms to gain further insight into the state and parameter estimation 
approaches and solutions [5]. 

As a final remark, this paper is a concrete example of the power of the functional 
analysis approach to estimation advanced in Ref. [4]. Because of the conceptual 
simplicity of the method, it has been possible to solve in this paper problems that would 
have defied solution by any other method. It has also made it possible to conceive areas 
for future research that would otherwise have been left unidentified. 
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